使用Tweepy避免Twitter API限制

Question

使用Tweepy避免Twitter API限制

37

我在Stack Exchange的一些问题中看到，限制可以是每15分钟请求数量的函数，也取决于算法的复杂性，但这并不是一个复杂的算法。

因此，我使用了这段代码：

import tweepy
import sqlite3
import time

db = sqlite3.connect('data/MyDB.db')

# Get a cursor object
cursor = db.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS MyTable(id INTEGER PRIMARY KEY, name TEXT, geo TEXT, image TEXT, source TEXT, timestamp TEXT, text TEXT, rt INTEGER)''')
db.commit()

consumer_key = ""
consumer_secret = ""
key = ""
secret = ""

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(key, secret)

api = tweepy.API(auth)

search = "#MyHashtag"

for tweet in tweepy.Cursor(api.search,
                           q=search,
                           include_entities=True).items():
    while True:
        try:
            cursor.execute('''INSERT INTO MyTable(name, geo, image, source, timestamp, text, rt) VALUES(?,?,?,?,?,?,?)''',(tweet.user.screen_name, str(tweet.geo), tweet.user.profile_image_url, tweet.source, tweet.created_at, tweet.text, tweet.retweet_count))
        except tweepy.TweepError:
                time.sleep(60 * 15)
                continue
        break
db.commit()
db.close()

我经常遇到Twitter限制错误：

Traceback (most recent call last):
  File "stream.py", line 25, in <module>
    include_entities=True).items():
  File "/usr/local/lib/python2.7/dist-packages/tweepy/cursor.py", line 153, in next
    self.current_page = self.page_iterator.next()
  File "/usr/local/lib/python2.7/dist-packages/tweepy/cursor.py", line 98, in next
    data = self.method(max_id = max_id, *self.args, **self.kargs)
  File "/usr/local/lib/python2.7/dist-packages/tweepy/binder.py", line 200, in _call
    return method.execute()
  File "/usr/local/lib/python2.7/dist-packages/tweepy/binder.py", line 176, in execute
    raise TweepError(error_msg, resp)
tweepy.error.TweepError: [{'message': 'Rate limit exceeded', 'code': 88}]

- 4m1nh4j1

可能是https://dev59.com/HHrZa4cB1Zd3GeqPzjVu的重复问题。 - Ashoka Lella

6个回答

34

问题在于您的`try: except:`块放在了错误的位置。将数据插入数据库永远不会引发`TweepError`——遍历`Cursor.items()`才会有可能出错。我建议重构您的代码，在一个无限循环中调用`Cursor.items()`的`next`方法。这个调用应该放在`try: except:`块中，因为它可能会引发错误。

以下是（大致）代码示例：

# above omitted for brevity
c = tweepy.Cursor(api.search,
                       q=search,
                       include_entities=True).items()
while True:
    try:
        tweet = c.next()
        # Insert into db
    except tweepy.TweepError:
        time.sleep(60 * 15)
        continue
    except StopIteration:
        break

这是可行的，因为当Tweepy引发TweepError时，它没有更新任何游标数据。下一次它发出请求时，将使用与触发速率限制的请求相同的参数，有效地重复该请求直到完成。

- Aaron Hill

1

谢谢@Aaron，使用tweepy时，添加monitor_rate_limit=True, wait_on_rate_limit=True而不是捕获异常是否可行？ - 4m1nh4j1

4

wait_on_rate_limit 选项可以防止出现异常。Tweepy 会自动休眠等待速率限制重新填充所需的时间。 - Aaron Hill

2

@jenn：在创建 API 实例时，将其作为关键字参数传递进去。 - Aaron Hill

1

ж€Ғи‡із›®е‰Қзә–е†™ж—¶пәЊжњЂж–°зљ„Tweepyз‰€жњ¬е·ІеЊ…еђ«RateLimitErrorеә‚еёёгЂ‚жқӨжғђпәљhttps://github.com/tweepy/tweepy/pull/611 - Hamman Samuel

2

使用 wait_on_rate_limit=True 是正确的做法。如果你不断地达到速率限制并睡眠，Twitter 最终会将您的帐户列入黑名单。我已经遇到过很多次了。 - sudo

显示剩余3条评论

24

只需替换

。

api = tweepy.API(auth)

随着

api = tweepy.API(auth, wait_on_rate_limit=True)

- Mayank Khullar

19

如果您想避免错误并遵守速率限制，您可以使用以下函数，该函数以您的api对象作为参数。它检索剩余请求次数与上次请求相同类型，如果需要，等待速率限制重置后再继续执行。

def test_rate_limit(api, wait=True, buffer=.1):
    """
    Tests whether the rate limit of the last request has been reached.
    :param api: The `tweepy` api instance.
    :param wait: A flag indicating whether to wait for the rate limit reset
                 if the rate limit has been reached.
    :param buffer: A buffer time in seconds that is added on to the waiting
                   time as an extra safety margin.
    :return: True if it is ok to proceed with the next request. False otherwise.
    """
    #Get the number of remaining requests
    remaining = int(api.last_response.getheader('x-rate-limit-remaining'))
    #Check if we have reached the limit
    if remaining == 0:
        limit = int(api.last_response.getheader('x-rate-limit-limit'))
        reset = int(api.last_response.getheader('x-rate-limit-reset'))
        #Parse the UTC time
        reset = datetime.fromtimestamp(reset)
        #Let the user know we have reached the rate limit
        print "0 of {} requests remaining until {}.".format(limit, reset)

        if wait:
            #Determine the delay and sleep
            delay = (reset - datetime.now()).total_seconds() + buffer
            print "Sleeping for {}s...".format(delay)
            sleep(delay)
            #We have waited for the rate limit reset. OK to proceed.
            return True
        else:
            #We have reached the rate limit. The user needs to handle the rate limit manually.
            return False 

    #We have not reached the rate limit
    return True

- Till Hoffmann

谢谢你的回答。对于处理另一个API非常有帮助，我也想尊重速率限制 :) - Jimi Oke

3

请注意，在最新的 tweepy 版本中，getheader() 函数已被 headers 字典替换，因此需要将 api.last_response.getheader('x-rate-limit-limit') 替换为 api.last_response.headers['x-rate-limit-remaining']。 - xro7

我会把这个 delay = abs(reset - datetime.datetime.now()).total_seconds() + buffer 放在前面，因为由于某种原因，我得到了一个负值作为 delay 的值。 - salvob

7

import tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
# will notify user on ratelimit and will wait by it self no need of sleep.
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

- Malik Faiq

1

你好！请解释一下这段代码是如何解决问题的。由于这是一个旧问题，已经有很多答案，请详细说明这是与其他已发布答案不同的解决方案。谢谢！-来自审核。 - d_kennetz

1

我只是简单地添加了Pythonic的方式来初始化tweepy API处理速率限制。 - Malik Faiq

0

我建议你使用新的API v2并使用带有标志wait_on_rate_limit=True的客户端对象，v1将很快被弃用

client = tweepy.Client(consumer_key=auth.consumer_key, consumer_secret=auth.consumer_secret, access_token_secret=auth.access_token_secret, access_token=auth.access_token,
                       bearer_token=twitter_bearer_token, wait_on_rate_limit=True)

这将是完全自动化的

- badr

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- dancow · Accepted Answer

如果有人在谷歌上看到这篇文章，需要注意的是，tweepy 3.2+ 版本增加了 tweepy.api 类的附加参数，尤其是：

wait_on_rate_limit – 是否自动等待速率限制来补充
wait_on_rate_limit_notify – 是否在 Tweepy 等待速率限制补充时打印通知

将这些标志设置为 True 将把等待委托给 API 实例，这对于大多数简单的用例来说已经足够了。