我正在遵循这里的建议pandas create new column based on values from other columns,但仍然出现错误。基本上,我的 Pandas 数据框有许多列,我想根据一个新的分类列对数据框进行分组,该分类列的值取决于两个现有列(AMP、Time)。
df
df['Time'] = pd.to_datetime(df['Time'])
#making sure Time column read from the csv file is time object
import datetime as dt
day_1 = dt.date.today()
day_2 = dt.date.today() - dt.timedelta(days = 1)
def f(row):
if (row['AMP'] > 100) & (row['Time'] > day_1):
val = 'new_positives'
elif (row['AMP'] > 100) & (day_2 <= row['Time'] <= day_1):
val = 'rec_positives'
elif (row['AMP'] > 100 & row['Time'] < day_2):
val = 'old_positives'
else:
val = 'old_negatives'
return val
df['GRP'] = df.apply(f, axis=1) #this gives the following error:
TypeError: ("Cannot compare type 'Timestamp' with type 'date'", 'occurred at index 0')
df[(df['AMP'] > 100) & (df['Time'] > day_1)] #this works fine
df[(df['AMP'] > 100) & (day_2 <= df['Time'] <= day_1)] #this works fine
df[(df['AMP'] > 100) & (df['Time'] < day_2)] #this works fine
#df = df.groupby('GRP')
我能够根据上述条件选择正确的子数据帧,但是当我对每行应用上述函数时,会出现错误。基于上述条件对数据框进行分组的正确方法是什么?
编辑:
不幸的是,我无法提供我的数据框示例。但是,这里有一个简单的数据框,它会产生相同类型的错误:
import numpy as np
import pandas as pd
mydf = pd.DataFrame({'a':np.arange(10),
'b':np.random.rand(10)})
def f1(row):
if row['a'] < 5 & row['b'] < 0.5:
value = 'less'
elif row['a'] < 5 & row['b'] > 0.5:
value = 'more'
else:
value = 'same'
return value
mydf['GRP'] = mydf.apply(f1, axis=1)
ypeError: ("unsupported operand type(s) for &: 'int' and 'float'", 'occurred at index 0')
编辑2: 如下所建议,将比较运算符括在括号内即可解决这个虚构的例子。这个问题已经解决。
然而,在我的真实例子中,我仍然遇到了同样的错误。顺便说一句,如果我要使用表格中的“AMP”列和另一列,则一切正常,并且我能够通过对每行应用函数f来创建df ['GRP']。这表明问题与使用df ['Time']有关。但是为什么我能够选择df [(df ['AMP']> 100)&amp;(df ['Time']> day_1)]?为什么在这种情况下可以工作,但出现在函数中的条件却不能工作呢?