使用Pandas，如何删除每个分组的最后一行？

Question

使用Pandas，如何删除每个分组的最后一行？

19

我有一个如下所示的数据框：

import pandas as pd
df = pd.DataFrame({'A': ['one', 'one', 'two', 'three', 'three', 'one'], 'B': range(6)})
grouped = df.groupby('A')
print grouped.head()

             A  B
A                
one   0    one  0
      1    one  1
      5    one  5
three 3  three  3
      4  three  4
two   2    two  2

我可以通过以下方式轻松地选择每个组的最后几行：

print(grouped.agg(lambda x: x.iloc[-1]))

      B
A       
one    5
three  4
two    2

我该如何删除每个分组的最后一行？结果应为：

       A  B
0    one  0
1    one  1
3  three  3

我已经尝试过筛选，但好像没有任何效果：

print grouped.filter(lambda x: x.iloc[-1])

       A  B
0    one  0
1    one  1
5    one  5
3  three  3
4  three  4
2    two  2

谢谢！

- user3465658

4个回答

12

怎么样：

>>> df.groupby("A", as_index=False).apply(lambda x: x.iloc[:-1])
       A  B
0    one  0
1    one  1
3  three  3

[3 rows x 2 columns]

- DSM

在 pandas 0.12.0 中，对我来说索引仍然是 'A'（带有该标签）。 - ely

1

请按照以下步骤操作：

df.drop(df.groupby('A').tail(1).index, axis=0)

- user10302409

0

您可以使用方法 duplicated：

df[df.duplicated('A', keep='last')]

输出：

       A  B
0    one  0
1    one  1
3  three  3

- Mykola Zotko

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Andy Hayden · Accepted Answer

15

你可能会发现使用cumcount更快：

In [11]: df[grouped.cumcount(ascending=False) > 0]
Out[11]: 
       A  B
0    one  0
1    one  1
3  three  3

- Andy Hayden

不错。而且应该会快几倍。 - DSM

@DSM 我曾经想过这是否应该作为head(-2)或tail(-2)可用... 可能不是。 - Andy Hayden

timeit在一个40m的记录df上：% timeit temp=dfd.groupby('bucket', as_index=False).apply(lambda x: x.iloc[:-1]) 1个循环，3个中最好的结果：每个循环17.1秒 - clg4

% timeit temp=dfd.groupby('bucket', as_index=False).cumcount(ascending=False) 1 次循环，3 次中的最佳表现：每次循环耗时 4.24 秒 - clg4