从R访问Twitter API的用户查找结果出现错误(403)

9
使用Twitter API和twitteR包,我正在尝试检索一长串名称(在50,000到100,000之间)的用户对象。
我一直收到以下错误:
Error in twInterfaceObj$doAPICall(paste("users", "lookup", sep = "/"),  : 
  client error: (403) Forbidden

错误代码似乎提示需要"更新限制"。 但是用户查找速率限制为180,每批次查找100个用户名。 因此,多达18,000个用户不应该是问题。但即使将数量减少到6000(以尊重通过应用程序认证的请求限制)每15分钟时间窗口仍会导致相同的错误。
这里是一个MWE(但您需要自己的API密钥):
library(plyr)
# install the latest versions from github:
# devtools::install_github("twitteR", username="geoffjentry")
# devtools::install_github("hadley/httr")
library(twitteR)
library(httr)    

source("TwitterKeys.R") # Your own API-Keys
setup_twitter_oauth(consumerKey, consumerSecret, accessToken, accessSecret)

# The following is just to generate a large enough list of user names:
searchTerms <- c("worldcup", "economy", "climate", "wimbledon", 
                 "apple", "android", "news", "politics")

# This might take a while
sample <- llply(searchTerms, function(term) {
  tweets <- twListToDF(searchTwitter(term, n=3200))
  users <- unique(tweets$screenName)
  return(users)
})

userNames <- unique(unlist(sample))

# This function is supposed to perform the lookups in batches 
# and mind the rate limit:
getUserObjects <- function(users) {
  groups <- split(users, ceiling(seq_along(users)/6000))
  userObjects <- ldply(groups, function(group) {
    objects <- lookupUsers(group)
    out <- twListToDF(objects)
    print("Waiting for 15 Minutes...")
    Sys.sleep(900)
    return(out)
  })
  return(userObjects)
}

# Putting it into action:
userObjects <- getUserObjects(userNames)

有时手动查找较小的子集,例如通过lookupUsers(userNames[1:3000])进行查找,可以工作;但是当我尝试自动化这个过程时,就会出现错误。有人知道可能的原因吗?
2个回答

1

我知道这个问题很旧,但最近我遇到了这个问题,并且找不到任何足以解决问题的答案。

底线:

添加tryCatch()错误处理系统并将调用分成两个小调用,每个调用包含50个ID,问题得到解决。

长话短说

对我来说,我注意到API似乎在同一点(大约第4,100个ID)失败。添加了一些错误处理后,我能够确定我的ID列表中的8个100个部分无法正常工作。但是,在使用twitter API Console时,这些ID可以工作。我查看了github中的代码,但找不到它应该出错的原因。实验发现将调用分成两个部分完美地解决了问题。以下是一个有效的代码示例。

N <- NROW(Data)      # Keeps track of how many more id's we have
count <- 1           # Keeps track of which ID we are at
Len <- N             # so we don't index out of range (see below)
Stop <- 0            # Contains the value that we should Stop each batch at
j = 0                # Keeps Track of how many calls made
while (N > 0 && j <= 180) {

    tryCatch({
    
    # Set The Stop value so that if we hit the end of the list it doesn't
    # Give a value that is out of range
    Stop <<- min(c(count + 99, Len))
    
    # Keep track of how many calls we have made
    j = j + 1   
    User_Data <- lookupUsers(Data$user_id_str[count:Stop], includeNA = TRUE)

    #... CODE THAT STORES DATA AS NEEDED
    
    # Update for next iteration
    N <<- N - 100
    count <<- count + 100
    message(paste("Users Searched: ", (count-1), "/", Len))

    },

    error = function(e) {
  
      message("Twitter sent back 403 error, Trying again with half as many tweets")
      Stop <<- min(c(count + 49, Len))
  
      j <<- j + 1
      # FIRST SECOND TRY 
      User_Data <- lookupUsers(Data$user_id_str[count:Stop], includeNA = TRUE)
  
      #... CODE THAT STORES DATA AS NEEDED
      N <<- N - 50
      count <<- count + 50
      message(paste("Users Searched: ", Stop, "/", Len))
  
      Stop <<- min(c(count + 49, Len))
  
      j <<- j + 1
      # SECOND SECOND TRY
      User_Data <- lookupUsers(Freelancers$user_id_str[count:Stop], includeNA = TRUE)
  
      #... CODE THAT STORES DATA AS NEEDED
      N <<- N - 50
      count <<- count + 50
      message(paste("Users Searched: ", Stop, "/", Len))
    })

}

1
根据这篇文章 我在第一次请求时就被 Twitter 的速率限制所阻挡 ,Twitter 不仅对用户总数有限制,还对每个15分钟时间段内的调用次数有限制。如果每个调用包含100个用户,而您想查找6000个用户,则需要进行60次调用,而您只允许进行15次。请尝试让程序休眠并在15分钟后再次发出调用。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接