属性错误：'float'对象没有'lower'属性。

Question

属性错误：'float'对象没有'lower'属性。

19

我遇到了这个属性错误，我卡在如何处理推文中出现的浮点值。流式推文必须被转换为小写并进行分词，因此我使用了 split 函数。

有人能帮我应对这个问题吗，提供任何解决方法或解决方案吗..？

以下是我遇到的错误...

AttributeError                            Traceback (most recent call last)
<ipython-input-28-fa278f6c3171> in <module>()
      1 stop_words = []
----> 2 negfeats = [(word_feats(x for x in p_test.SentimentText[f].lower().split() if x not in stop_words), 'neg') for f in l]
      3 posfeats = [(word_feats(x for x in p_test.SentimentText[f].lower().split() if x not in stop_words), 'pos') for f in p]
      4 
      5 trainfeats = negfeats+ posfeats

AttributeError: 'float' object has no attribute 'lower'

这是我的代码

p_test = pd.read_csv('TrainSA.csv')

stop_words = [ ]

def word_feats(words):

    return dict([(word, True) for word in words])


l = [ ]

for f in range(len(p_test)):

    if p_test.Sentiment[f] == 0:

        l.append(f)



p = [ ]

for f in range(len(p_test)):

    if p_test.Sentiment[f] == 1:

        p.append(f) 




negfeats = [(word_feats(x for x in p_test.SentimentText[f].lower().split() if x not in stop_words), 'neg') for f in l]

posfeats = [(word_feats(x for x in p_test.SentimentText[f].lower().split() if x not in stop_words), 'pos') for f in p]


trainfeats = negfeats+ posfeats

print len(trainfeats)


import random 

random.shuffle(trainfeats)

print(len(trainfeats))




p_train = pd.read_csv('TrainSA.csv')


l_t = []

for f in range(len(p_train)):

    if p_train.Sentiment[f] == 0:

        l_t.append(f)


p_t = []

for f in range(len(p_train)):

    if p_train.Sentiment[f] == 1:

        p_t.append(f)        

print len(l_t)

print len(p_t)

我尝试了很多方法，但仍然无法让他们使用lower和split函数。

- Vishal Kharde

3

显然，p_test.SentimentText[f] 是一个浮点数而不是一个字符串。你不能在一个浮点数上调用 lower() 方法。 - Kevin

通常最好附上实际的错误文本和回溯信息，而不仅仅是提到它 - 否则人们必须猜测该错误可能来自哪里。 - Lav

6个回答

20

我感觉你的问题根源在于 pd.read_csv('TrainSA.csv') 函数。虽然你没有发布这个例程，但我假设它是 Pandas 的 read_csv 函数。该函数会智能地将输入转换为 Python 数据类型，但这也意味着在你的情况下，一些值可能会被转换为浮点数。你可以通过指定每列期望的数据类型来防止这种智能（？）行为。

- Dick Kniep

4

我遇到了与我的数据集相似的错误。设置dtype参数并不能帮助我解决问题。我需要准备我的数据集。问题出在NaN列值上。数据集部分如下：

Id,Category,Text
1,contract,"Some text with commas, and other "
2,contract,

所以我的解决方法是：在read_csv之前，我添加了虚拟文本而不是空行：

Id,Category,Text
1,contract,"Some text with commas, and other "
2,contract,"NaN"

现在我的应用程序运行良好。

- feeeper

4

如果您正在使用数据框，可以使用以下方法删除 NA:

df = df.dropna()

- Vishrant

1

当我们需要所有数据时，这不是一个好的解决方案。在这个答案中，我们会错过一些具有Na值的数据。 - Balive13

@Balive13，你可以设置一个默认值，dropna是其中一种解决方案，如果适用的话，将其替换为默认值是另一种解决方案。 - Vishrant

你可以使用 df.fillna("0") 代替。 - Hajar Homayouni

1

df=pd.read_excel("location\file.xlsx")
df.characters=df.characters.astype(str)

我试过这个，我得到了答案。

- MK MR ANSARI

0

您可以确保DataFrame系列不为空或不存在缺失值。

在执行任何操作之前，您可以执行以下步骤。

df = df[df['ColumnName'].notna()]

- Sachin Prasad H S

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Vishal Kharde · Accepted Answer

谢谢@Dick Kniep。是的，它是Pandas CSV读取器。你的建议奏效了。以下是对我有用的Python代码，通过指定字段数据类型（在本例中为字符串）实现了目标。

p_test = pd.read_csv('TrainSA.csv')
p_test.SentimentText=p_test.SentimentText.astype(str)