如何在 tweepy 模块中添加位置过滤器

Question

如何在 tweepy 模块中添加位置过滤器

23

我找到了下面的代码，可以很好地在Python Shell中让我查看推特火箭筒的标准1%。

import sys
import tweepy

consumer_key=""
consumer_secret=""
access_key = ""
access_secret = "" 


auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)


class CustomStreamListener(tweepy.StreamListener):
    def on_status(self, status):
        print status.text

    def on_error(self, status_code):
        print >> sys.stderr, 'Encountered error with status code:', status_code
        return True # Don't kill the stream

    def on_timeout(self):
        print >> sys.stderr, 'Timeout...'
        return True # Don't kill the stream

sapi = tweepy.streaming.Stream(auth, CustomStreamListener())
sapi.filter(track=['manchester united'])

我该如何添加过滤器以仅解析来自特定位置的推文？我看到其他与Twitter相关的Python代码中添加了GPS，但我在Tweepy模块中找不到任何特定于sapi的内容。

有什么想法吗？

谢谢

- gdogg371

我认为我的问题是一个连接问题。按GPS过滤器的语法应该是'sapi.filter(locations=[-122.75,36.8,-121.75,37.8])'，但是使用我正在使用的语法将关键字与跟踪过滤器组合起来似乎无法工作。 - gdogg371

4个回答

20

Juan给出了正确的答案。我正在使用这个过滤器只显示德国:

# Bounding boxes for geolocations
# Online-Tool to create boxes (c+p as raw CSV): http://boundingbox.klokantech.com/
GEOBOX_WORLD = [-180,-90,180,90]
GEOBOX_GERMANY = [5.0770049095, 47.2982950435, 15.0403900146, 54.9039819757]

stream.filter(locations=GEOBOX_GERMANY)

这是一个相当粗糙的框，其中包括了其他一些国家的部分。如果需要更精细的区域，您可以组合多个框来填补所需的位置。

但需要注意的是，如果通过地理标签过滤，您将大大限制推文的数量。这是来自我的测试数据库中约500万条推文的数据（查询应返回实际包含地理位置的推文的百分比）:

> db.tweets.find({coordinates:{$ne:null}}).count() / db.tweets.count()
0.016668392651547598

我的1%数据流样本中只有1.67%包含地理标记。不过，还有其他方法可以确定用户的位置: http://arxiv.org/ftp/arxiv/papers/1403/1403.2345.pdf

- Kristian Rother

位置预测论文非常有用。 - Hamman Samuel

0

在流式处理时无法进行过滤，但如果您将推文写入文件，则可以在输出阶段进行过滤。

- Clovis

-3

sapi.filter(track=['曼彻斯特联队'],locations=['GPS坐标'])

- gdogg371

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Juan E. · Accepted Answer

流媒体API无法同时按地点和关键字过滤。

引用：边界框不会作为其他筛选参数的过滤器。例如track=twitter&locations=-122.75,36.8,-121.75,37.8将匹配任何包含术语Twitter（甚至非地理推文）或来自旧金山地区的推文。

来源：https://dev.twitter.com/docs/streaming-apis/parameters#locations 你可以向流媒体API请求关键字或定位推文，然后通过查看每个推文在应用程序中过滤结果流。

如果您按如下修改代码，则将捕获英国的推文，然后对这些推文进行过滤，仅显示包含“曼联”的推文。

import sys
import tweepy

consumer_key=""
consumer_secret=""
access_key=""
access_secret=""

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)


class CustomStreamListener(tweepy.StreamListener):
    def on_status(self, status):
        if 'manchester united' in status.text.lower():
            print status.text

    def on_error(self, status_code):
        print >> sys.stderr, 'Encountered error with status code:', status_code
        return True # Don't kill the stream

    def on_timeout(self):
        print >> sys.stderr, 'Timeout...'
        return True # Don't kill the stream

sapi = tweepy.streaming.Stream(auth, CustomStreamListener())    
sapi.filter(locations=[-6.38,49.87,1.77,55.81])