我希望能够统计评论字符串中某个词重复出现的次数。
我正在读取CSV文件,并使用以下代码将其存储在Python数据框中。
reviews = pd.read_csv("amazon_baby.csv")
下面几行代码在我将其应用于单个评论时有效。print reviews["review"][1]
a = reviews["review"][1].split("disappointed")
print a
b = len(a)
print b
上述代码的输出为:it came early and was not disappointed. i love planet wise bags and now my wipe holder. it keps my osocozy wipes moist and does not leak. highly recommend it.
['it came early and was not ', '. i love planet wise bags and now my wipe holder. it keps my osocozy wipes moist and does not leak. highly recommend it.']
2
当我使用下面这行代码尝试将同样的逻辑应用于整个数据帧时,我收到了一个错误消息
reviews['disappointed'] = len(reviews["review"].split("disappointed"))-1
错误信息:
Traceback (most recent call last):
File "C:/Users/gouta/PycharmProjects/MLCourse1/Classifier.py", line 12, in <module>
reviews['disappointed'] = len(reviews["review"].split("disappointed"))-1
File "C:\Users\gouta\Anaconda2\lib\site-packages\pandas\core\generic.py", line 2360, in __getattr__
(type(self).__name__, name))
AttributeError: 'Series' object has no attribute 'split'
lambda x: len(x["review"].split("disappointed")) -1
。这里的x
是传递给函数的行而不是整个数据框本身。 - hoyland