Tweepy StreamListener 转换为 CSV

4

我是Python的新手,正在尝试使用Tweepy和流API从Twitter检索数据并将其转换为CSV文件的应用程序。

问题在于这段代码没有创建输出CSV文件,可能是因为我应该设置代码在达到例如1000个推文时停止,但我不知道如何设置停止点。

以下是代码:

import sys
import tweepy
import csv

#pass security information to variables
consumer_key=""
consumer_secret=""
access_key = ""
access_secret = ""


#use variables to access twitter
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)

#create an object called 'customStreamListener'

class CustomStreamListener(tweepy.StreamListener):

    def on_status(self, status):
        print (status.author.screen_name, status.created_at, status.text)


    def on_error(self, status_code):
        print >> sys.stderr, 'Encountered error with status code:', status_code
        return True # Don't kill the stream

    def on_timeout(self):
        print >> sys.stderr, 'Timeout...'
        return True # Don't kill the stream


streamingAPI = tweepy.streaming.Stream(auth, CustomStreamListener())
streamingAPI.filter(track=['Dallas', 'NewYork'])

def on_status(self, status):
    with open('OutputStreaming.txt', 'w') as f:
        f.write('Author,Date,Text')
        writer = csv.writer(f)
        writer.writerow([status.author.screen_name, status.created_at, status.text])

有什么建议吗?

1
你的第二个 on_status 函数不在 CustomStreamListener 类中。 - Selcuk
1个回答

7
您试图编写的用于csv文件写入的函数从未被调用。我认为您想将此代码编写在CustomStreamListener.on_status中。另外,您必须在流监听器之外首先写入文件标题。请查看以下代码:
import sys
import tweepy
import csv

#pass security information to variables
consumer_key=""
consumer_secret=""
access_key = ""
access_secret = ""


#use variables to access twitter
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)

#create an object called 'customStreamListener'

class CustomStreamListener(tweepy.StreamListener):

    def on_status(self, status):
        print (status.author.screen_name, status.created_at, status.text)
        # Writing status data
        with open('OutputStreaming.txt', 'a') as f:
            writer = csv.writer(f)
            writer.writerow([status.author.screen_name, status.created_at, status.text])


    def on_error(self, status_code):
        print >> sys.stderr, 'Encountered error with status code:', status_code
        return True # Don't kill the stream

    def on_timeout(self):
        print >> sys.stderr, 'Timeout...'
        return True # Don't kill the stream

# Writing csv titles
with open('OutputStreaming.txt', 'w') as f:
    writer = csv.writer(f)
    writer.writerow(['Author', 'Date', 'Text'])

streamingAPI = tweepy.streaming.Stream(auth, CustomStreamListener())
streamingAPI.filter(track=['Dallas', 'NewYork'])

你是对的,我错了,但现在似乎出现了字符问题。当我尝试运行代码时,只打印一些推文数据,然后返回此代码'UnicodeEncodeError:'charmap'编解码器无法在位置111处编码字符'\ U0001f44d':字符映射到<undefined>'。当遇到某些包含特殊字符的推文时会发生这种情况,我该如何解决这个问题?此外,我该如何停止搜索推文,例如在1000条推文之后? - Andrea Angeli
这可能是因为无法打印推文文本的Unicode部分。 尝试使用status.text.encode('utf-8')而不仅仅是status.text(在打印行和带有writerow的行中都要修改)。 此外,使用utf-8编码打开文件: with open('OutputStreaming.txt', 'w'', encoding="utf8") as f:。 关于另一个问题(限制为1000条推文),请发布另一个问题。 - Or B
另外,如果这个答案对您有帮助,请接受它,这样将来遇到类似问题的用户就可以轻松找到解决方案。 - Or B

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接