如何在pandas中比较数据框中的行

Question

如何在pandas中比较数据框中的行

3

我希望能够比较两行数据，其中ID号码相同（例如第0行和第1行），然后删除绝对收入较小的那一行。是否有办法仅使用pandas函数而不是通过.itertuples()循环遍历行来完成这个操作。我想使用.shift和.apply，但不确定如何执行。

 Index   ID             Income  
 0       2011000070      55019   
 1       2011000070          0   
 2       2011000074      23879   
 3       2011000074          0   
 4       2011000078          0   
 5       2011000078          0   
 6       2011000118     -32500   
 7       2011000118          0

我想要的输出：

 Index   ID             Income  
 0       2011000070      55019     
 2       2011000074      23879     
 4       2011000078          0     
 6       2011000118     -32500

- stav

3个回答

1

这可以工作。

In [458]: df.groupby('ID', as_index=False).apply(lambda x: x.ix[x.Income.abs().idxmax()])
Out[458]:
   Index          ID  Income
0      0  2011000070   55019
1      2  2011000074   23879
2      4  2011000078       0
3      6  2011000118  -32500

- Zero

1

使用 pandas.DataFrame.drop_duplicates 并按 ID 和绝对值的 Income 进行排序应该可以解决您的问题。它的 keep 参数默认为 "first"，这正是您想要的。

df['Income_abs'] = df['Income'].apply(abs)

df.sort_values(['ID', 'Income_abs'], ascending=[True,False]).drop_duplicates(['ID']).drop('Income_abs',axis=1)
Out[26]: 
   Index          ID  Income
0      0  2011000070   55019
2      2  2011000074   23879
4      4  2011000078       0
6      6  2011000118  -32500

- blacksite

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jezrael · Accepted Answer

你需要使用DataFrameGroupBy.idxmax和Series.abs查找最大绝对值的索引，然后使用loc选择行：

print (df.groupby('ID')['Income'].apply(lambda x: x.abs().idxmax()))
ID
2011000070    0
2011000074    2
2011000078    4
2011000118    6
Name: Income, dtype: int64

df = df.loc[df.groupby('ID')['Income'].apply(lambda x: x.abs().idxmax())]
print (df)
   Index          ID  Income
0      0  2011000070   55019
2      2  2011000074   23879
4      4  2011000078       0
6      6  2011000118  -32500

替代方案：

df = df.loc[df['Income'].abs().groupby(df['ID']).idxmax()]
print (df)
   Index          ID  Income
0      0  2011000070   55019
2      2  2011000074   23879
4      4  2011000078       0
6      6  2011000118  -32500