将对象列表写入CSV文件

4
我正在编写一个Python程序,循环遍历Reddit的提交内容,提取数据,并将其存储为列表中的对象。然而,我在编写该列表到CSV文件时遇到了问题。文件已创建,但它只给出了一些对象的ID标签。我应该如何更改CSV代码? 代码:
import praw
from datetime import datetime
import pandas as pd

class Submission:
    def __init__(self, time, score, title, text, ofReddit, serious):
        self.time = time
        self.score = score
        self.title = title
        self.text = text
        self.ofReddit = ofReddit
        self.serious = serious
data = []

reddit = praw.Reddit(client_id=id, client_secret=secret,
                     user_agent='testscript by /u/SilentButtDeadlies')
subreddit = reddit.subreddit('AskReddit')
for submission in subreddit.new(limit=50):
    time = datetime.utcfromtimestamp(submission.created_utc).hour
    score = submission.score
    title = len(submission.title)
    text = len(submission.selftext)
    if 'of reddit' in submission.title.lower():
        ofReddit = 1
    else:
        ofReddit = 0
    if '[serious]' in submission.title.lower():
        serious = 1
    else:
        serious = 0
    data.append(Submission(time, score, title, text, ofReddit, serious))
df = pd.DataFrame(data)
filename = 'AskRedditData' + str(datetime.now()) + '.csv'
df.to_csv(filename, index=False, encoding='utf-8')

CSV文件

0
<__main__.Submission instance at 0x1118f6ef0>
<__main__.Submission instance at 0x1118f68c0>
<__main__.Submission instance at 0x1118f6950>
<__main__.Submission instance at 0x1118c3758>
<__main__.Submission instance at 0x11239c638>
<__main__.Submission instance at 0x11239c5f0>
<__main__.Submission instance at 0x112398908>
<__main__.Submission instance at 0x112398998>
<__main__.Submission instance at 0x112398878>
<__main__.Submission instance at 0x1123989e0>
<__main__.Submission instance at 0x112398c68>
<__main__.Submission instance at 0x11239fe18>
<__main__.Submission instance at 0x11239fe60>
<__main__.Submission instance at 0x11239fea8>
<__main__.Submission instance at 0x11239fef0>
<__main__.Submission instance at 0x11239ff38>
<__main__.Submission instance at 0x11239ff80>
<__main__.Submission instance at 0x11239ffc8>
<__main__.Submission instance at 0x112404050>
<__main__.Submission instance at 0x112404098>
<__main__.Submission instance at 0x1124040e0>
<__main__.Submission instance at 0x112404128>
<__main__.Submission instance at 0x112404170>
<__main__.Submission instance at 0x1124041b8>
<__main__.Submission instance at 0x112404200>
<__main__.Submission instance at 0x112404248>
<__main__.Submission instance at 0x112404290>
<__main__.Submission instance at 0x1124042d8>
<__main__.Submission instance at 0x112404320>
<__main__.Submission instance at 0x112404368>
<__main__.Submission instance at 0x1124043b0>
<__main__.Submission instance at 0x1124043f8>
<__main__.Submission instance at 0x112404440>
<__main__.Submission instance at 0x112404488>
<__main__.Submission instance at 0x1124044d0>
<__main__.Submission instance at 0x112404518>
<__main__.Submission instance at 0x112404560>
<__main__.Submission instance at 0x1124045a8>
<__main__.Submission instance at 0x1124045f0>
<__main__.Submission instance at 0x112404638>
<__main__.Submission instance at 0x112404680>
<__main__.Submission instance at 0x1124046c8>
<__main__.Submission instance at 0x112404710>
<__main__.Submission instance at 0x112404758>
<__main__.Submission instance at 0x1124047a0>
<__main__.Submission instance at 0x1124047e8>
<__main__.Submission instance at 0x112404830>
<__main__.Submission instance at 0x112404878>
<__main__.Submission instance at 0x1124048c0>
<__main__.Submission instance at 0x112404908>

你期望它写什么?这是所有对象从“object”继承的默认__str__实现。 - juanpa.arrivillaga
2
另外,你是不是只是用 pandas 来写一个 csv 文件?这似乎有些过度了。你应该使用 csv 模块。 - juanpa.arrivillaga
抱歉,我对这些都很新。只使用CSV文件会更好吗?我希望能够像这样编写对象:{time: ####, score: #### ...} - Marjorie Pickard
尝试将 df = pd.DataFrame(data) 更改为 df = pd.DataFrame([obj.__dict__ for obj in data])。Pandas 数据框需要从 Pandas 可以理解的对象构建,其中一个选项是字典列表。 - calico_
尝试使用我下面使用的答案,包括括号和冒号:@MarjoriePickard。 - juanpa.arrivillaga
1个回答

4

您的提交类似乎只是作为记录类型来使用。您可能只需要使用 namedtuple。因此,请将您的类定义替换为:

from collections import namedtuple
Submission = namedtuple('Submission', ['time', 'score', 'title', 'text', 'ofReddit', 'serious'])

现在你的其余代码应该可以正常运行。Pandas不知道如何解释你最初编写的Submission类。因此,它只是创建了一个Submission对象的单列,并在写入时使用默认的str(Submission()),该默认值为object __str__,因为您没有定义另一个__str__。实际上,您想要使用序列。namedtuple函数实际上是一个类工厂,它创建了一个从元组派生的记录类型,因此它具有您需要的所有方便功能和非常方便的构造函数。
现在,由于您正在使用Python 2,我没有修改您使用pandas的方式,即使仅用它来编写csv似乎有点过度。话虽如此,让Python 2 csv模块与Unicode兼容真是一件麻烦事,所以最好保持不变。如果您可以切换到Python 3,则可以将pandas内容简单地替换为:
import csv
with open(filename, 'w', newline='', encoding='utf8') as f:
    writer = csv.writer(f)
    writer.writerow(Submission._fields) # namedtuple breaks convention public fields have single underscore
    writer.writerows(data)

非常感谢您! - Marjorie Pickard
2
当你发现自己在编写一个预期用来创建一系列基本上作为记录的对象的类时(即只有数据属性,没有方法),那么你可能可以直接使用namedtuple。这将为您编写一个非常高效的类! - juanpa.arrivillaga
1
@MarjoriePickard 当我说它为你编写类时,我的意思是,它会生成一个类定义并执行它。你可以查看print(Submission._source),准确地了解你的 Submission 命名元组类长什么样子。 - juanpa.arrivillaga

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接