考虑另一列的值，如何从数据框中删除重复项

Question

考虑另一列的值，如何从数据框中删除重复项

5

当我以'name'为列名指定重复项时，将John删除为重复项：

import pandas as pd   
data = {'name':['Bill','Steve','John','John','John'], 'age':[21,28,22,30,29]}
df = pd.DataFrame(data)
df = df.drop_duplicates('name')

pandas删除所有匹配实体，仅保留最左侧的实体：

   age   name
0   21   Bill
1   28  Steve
2   22   John

我希望保留John年龄最大的行（在这个例子中是30岁）。如何实现？

- alphanumeric

尝试使用以下代码：df.drop_duplicates('name', keep='last') 或 df.sort_values('age').drop_duplicates('name', keep='last') - MaxU - stand with Ukraine

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- MaxU - stand with Ukraine · Accepted Answer

试试这个：

In [75]: df
Out[75]:
   age   name
0   21   Bill
1   28  Steve
2   22   John
3   30   John
4   29   John

In [76]: df.sort_values('age').drop_duplicates('name', keep='last')
Out[76]:
   age   name
0   21   Bill
1   28  Steve
3   30   John

根据您的目标，可以选择这个或那个：

In [77]: df.drop_duplicates('name', keep='last')
Out[77]:
   age   name
0   21   Bill
1   28  Steve
4   29   John