从R访问Twitter API的用户查找结果出现错误（403）

Question

从R访问Twitter API的用户查找结果出现错误（403）

9

使用Twitter API和twitteR包，我正在尝试检索一长串名称（在50,000到100,000之间）的用户对象。

我一直收到以下错误：

Error in twInterfaceObj$doAPICall(paste("users", "lookup", sep = "/"),  : 
  client error: (403) Forbidden

错误代码似乎提示需要"更新限制"。但是用户查找速率限制为180，每批次查找100个用户名。因此，多达18,000个用户不应该是问题。但即使将数量减少到6000（以尊重通过应用程序认证的请求限制）每15分钟时间窗口仍会导致相同的错误。

这里是一个MWE（但您需要自己的API密钥）：

library(plyr)
# install the latest versions from github:
# devtools::install_github("twitteR", username="geoffjentry")
# devtools::install_github("hadley/httr")
library(twitteR)
library(httr)    

source("TwitterKeys.R") # Your own API-Keys
setup_twitter_oauth(consumerKey, consumerSecret, accessToken, accessSecret)

# The following is just to generate a large enough list of user names:
searchTerms <- c("worldcup", "economy", "climate", "wimbledon", 
                 "apple", "android", "news", "politics")

# This might take a while
sample <- llply(searchTerms, function(term) {
  tweets <- twListToDF(searchTwitter(term, n=3200))
  users <- unique(tweets$screenName)
  return(users)
})

userNames <- unique(unlist(sample))

# This function is supposed to perform the lookups in batches 
# and mind the rate limit:
getUserObjects <- function(users) {
  groups <- split(users, ceiling(seq_along(users)/6000))
  userObjects <- ldply(groups, function(group) {
    objects <- lookupUsers(group)
    out <- twListToDF(objects)
    print("Waiting for 15 Minutes...")
    Sys.sleep(900)
    return(out)
  })
  return(userObjects)
}

# Putting it into action:
userObjects <- getUserObjects(userNames)

有时手动查找较小的子集，例如通过lookupUsers(userNames[1:3000])进行查找，可以工作；但是当我尝试自动化这个过程时，就会出现错误。有人知道可能的原因吗？

- Dave

2个回答

1

根据这篇文章我在第一次请求时就被 Twitter 的速率限制所阻挡，Twitter 不仅对用户总数有限制，还对每个15分钟时间段内的调用次数有限制。如果每个调用包含100个用户，而您想查找6000个用户，则需要进行60次调用，而您只允许进行15次。请尝试让程序休眠并在15分钟后再次发出调用。

- orange1

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- calder-ty · Accepted Answer

我知道这个问题很旧，但最近我遇到了这个问题，并且找不到任何足以解决问题的答案。

底线：

添加tryCatch()错误处理系统并将调用分成两个小调用，每个调用包含50个ID，问题得到解决。

长话短说

对我来说，我注意到API似乎在同一点（大约第4,100个ID）失败。添加了一些错误处理后，我能够确定我的ID列表中的8个100个部分无法正常工作。但是，在使用twitter API Console时，这些ID可以工作。我查看了github中的代码，但找不到它应该出错的原因。实验发现将调用分成两个部分完美地解决了问题。以下是一个有效的代码示例。

N <- NROW(Data)      # Keeps track of how many more id's we have
count <- 1           # Keeps track of which ID we are at
Len <- N             # so we don't index out of range (see below)
Stop <- 0            # Contains the value that we should Stop each batch at
j = 0                # Keeps Track of how many calls made
while (N > 0 && j <= 180) {

    tryCatch({
    
    # Set The Stop value so that if we hit the end of the list it doesn't
    # Give a value that is out of range
    Stop <<- min(c(count + 99, Len))
    
    # Keep track of how many calls we have made
    j = j + 1   
    User_Data <- lookupUsers(Data$user_id_str[count:Stop], includeNA = TRUE)

    #... CODE THAT STORES DATA AS NEEDED
    
    # Update for next iteration
    N <<- N - 100
    count <<- count + 100
    message(paste("Users Searched: ", (count-1), "/", Len))

    },

    error = function(e) {
  
      message("Twitter sent back 403 error, Trying again with half as many tweets")
      Stop <<- min(c(count + 49, Len))
  
      j <<- j + 1
      # FIRST SECOND TRY 
      User_Data <- lookupUsers(Data$user_id_str[count:Stop], includeNA = TRUE)
  
      #... CODE THAT STORES DATA AS NEEDED
      N <<- N - 50
      count <<- count + 50
      message(paste("Users Searched: ", Stop, "/", Len))
  
      Stop <<- min(c(count + 49, Len))
  
      j <<- j + 1
      # SECOND SECOND TRY
      User_Data <- lookupUsers(Freelancers$user_id_str[count:Stop], includeNA = TRUE)
  
      #... CODE THAT STORES DATA AS NEEDED
      N <<- N - 50
      count <<- count + 50
      message(paste("Users Searched: ", Stop, "/", Len))
    })

}