pandas - AttributeError 'dataframe' object has no attribute pandas - 属性错误：'dataframe'对象没有属性

Question

pandas - AttributeError 'dataframe' object has no attribute pandas - 属性错误：'dataframe'对象没有属性

pythonpandasdataframeindexingattributeerror

22

我正在尝试过滤包含产品列表的数据框。然而，每当我运行该代码时，都会出现错误。

以下是代码行：

include_clique = log_df.loc[log_df['Product'].str.contains("Product A")]

产品是一种对象数据类型。

import pandas as pd
import numpy as np

data = pd.read_csv("FILE.csv", header = None)

headerName = ["DRID", "Product", "M24", "M23", "M22", "M21"] 
data.columns = [headerName]

log_df = np.log(1 + data[["M24", "M23", "M22", "M21"]])
copy = data[["DRID", "Product"]].copy()
log_df = copy.join(log_df)

include_clique = log_df.loc[log_df['Product'].str.contains("Product A")]

这是标题：

       ID  PRODUCT       M24       M23       M22  M21
0  123421        A  0.000000  0.000000  1.098612  0.0   
1  141840        A  0.693147  1.098612  0.000000  0.0   
2  212006        A  0.693147  0.000000  0.000000  0.0   
3  216097        A  1.098612  0.000000  0.000000  0.0   
4  219517        A  1.098612  0.693147  1.098612  0.0

- David Luong

4

您的代码应该是可以运行的。您确定没有在某个地方使用 log_df.str（而不是 log_df['Product'].str）吗？或者可能存在具有这个名称 Product 的重复索引（例如，具有相同名称的两个列）吗？ - rafaelc

@jpp 是的，我确定。我会与您分享整个代码。 - David Luong

请仅发布您的df的前五行并将任何机密信息更改为foo，bar，blablabla等。我只是想了解您的df的结构。 - rafaelc

@RafaelC 抱歉，我在stackoverflow上还比较新。我会把它发布在正文中。 - David Luong

问题在于代码行np.log， log_df = np.log(1+data[["M24","M23","M22","M21","M20","M19","M18","M17","M16","M15","M14","M13","M12","M11","M10","M9","M8","M7","M6","M5","M4","M3","M2","M1"]])。 - jits_on_moon

显示剩余30条评论

2个回答

1

当您尝试访问数据框没有的属性时，会出现AttributeError: 'DataFrame' object has no attribute ...的错误。

常见情况是在列名包含空格（例如'col1 '）时，使用.而不是[]选择列。

df.col1       # <--- error
df['col1 ']   # <--- no error

另一个常见情况是当您尝试在DataFrame上调用Series方法时。例如，tolist()（或map()）是Series方法，因此必须在列上调用它们。如果您在DataFrame上调用它们，您将会得到：

AttributeError: 'DataFrame' object has no attribute 'tolist'

AttributeError: 'DataFrame' object has no attribute 'map'

正如hoang tran所解释的那样，这也是OP正在发生的事情。 .str 是一个系列访问器，对于数据框来说它并没有被实现。

另一个情况是如果您有拼写错误并尝试调用/访问一个根本未定义的属性; 例如，如果您尝试调用rows()而不是iterrows()，则会出现以下错误：

AttributeError: 'DataFrame' object has no attribute 'rows'

您可以使用以下推导式来检查完整属性列表。

[x for x in dir(pd.DataFrame) if not x.startswith('_')]

当您将列名分配为df.columns = [['col1'，'col2']]时，df现在是一个MultiIndex数据帧，因此要访问每个列，您需要传递一个元组：

df['col1'].str.contains('Product A')    # <---- error
df['col1',].str.contains('Product A')   # <---- no error; note the trailing comma

实际上，您可以传递一个元组来选择任何MultiIndex数据帧的列，例如：

df['level_1_colname', 'level_2_colname'].str.contains('Product A')

您还可以通过在多级索引列名上映射“扁平化器”函数来展开它。常见的一个是''.join：

df.columns = df.columns.map('_'.join)

- cottontail

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- hoang tran · Accepted Answer

简短回答：将data.columns=[headerName]更改为data.columns=headerName

解释：当您设置data.columns=[headerName]时，列是多级索引对象。因此，您的log_df ['Product']是一个DataFrame，而对于DataFrame没有str属性。

当您设置data.columns=headerName时，您的log_df['Product']是单个列，您可以使用str属性。

出于任何原因，如果您需要保留数据作为多级索引对象，还有另一种解决方案：首先将您的log_df['Product']转换为Series。之后，str属性可用。

products = pd.Series(df.Product.values.flatten())
include_clique = products[products.str.contains("Product A")]

不过，我猜第一个解决方案就是你要找的