当列中存在多个重复值时，选择第一行。

Question

当列中存在多个重复值时，选择第一行。

7

我希望在一个列中有多行重复值时，选择第一行。

例如：

import pandas as pd
df = pd.DataFrame({'col1':['one', 'one', 'one', 'one', 'one', 'one', 'one', 'one'], 
                   'col2':['ID=ABCD1234', 'ID=ABCD1234', 'ID=ABCD1234', 'ID=ABCD5678', 
                           'ID=ABCD5678', 'ID=ABCD5678', 'ID=ABCD9102', 'ID=ABCD9102']})

这个pandas数据框的样子如下：

print(df)
  col1         col2
0  one  ID=ABCD1234
1  one  ID=ABCD1234
2  one  ID=ABCD1234
3  one  ID=ABCD5678
4  one  ID=ABCD5678
5  one  ID=ABCD5678
6  one  ID=ABCD9102
7  one  ID=ABCD9102

我希望选择第0行、第3行和第6行，并将其输出为一个新的dataframe。

预期输出：

      col1         col2
    0  one  ID=ABCD1234
    3  one  ID=ABCD5678
    6  one  ID=ABCD9102

- botloggy

4

请使用 df = df.drop_duplicates() 这行代码。 - jezrael

2个回答

9

只需按行的值进行分组，然后使用 first() 选择第一行：

df.groupby('col2').first()

你可能也会决定按多列进行分组：

df.groupby(['col1', 'col2']).first()

- filbranden

1

我喜欢这个解决方案，因为语法优雅。 - Ram Narasimhan

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Joe · Accepted Answer

您可以使用：

df.drop_duplicates(subset = ['col2'], keep = 'first', inplace = True)