Python Pandas按多级索引和列排序

Question

Python Pandas按多级索引和列排序

8

在Pandas 0.17中，我尝试按特定列排序，同时保持分层索引（A和B）。 B是在设置数据帧时通过连接创建的运行号。我的数据如下：

          C      D
A   B
bar one   shiny  10
    two   dull   5
    three glossy 8
foo one   dull   3
    two   shiny  9
    three matt   12

我需要的是：

          C      D
A   B
bar two   dull   5
    three glossy 8
    one   shiny  10
foo one   dull   3
    three matt   12
    two   shiny  9

以下是我正在使用的代码和结果。注意：Pandas 0.17警告说dataframe.sort将被弃用。

df.sort_values(by="C", ascending=True)
          C      D
A   B
bar two   dull   5
foo one   dull   3
bar three glossy 8
foo three matt   12
bar one   shiny  10
foo two   shiny  9

```html

添加.groupby将产生相同的结果：

```

df.sort_values(by="C", ascending=True).groupby(axis=0, level=0, as_index=True)

同样地，先按照索引排序，再按列进行分组也不会产生有用的结果：

df.sort_index(axis=0, level=0, as_index=True).groupby(C, as_index=True)

我不确定重新索引，我需要保留第一个索引A，第二个索引B可以重新分配，但不必如此。如果没有简单的解决方案，那会让我很惊讶；我猜我只是找不到它。欢迎任何建议。

编辑：与此同时，我删除了第二个索引B，将第一个索引A重新分配为一个列，而不是索引排序多个列，然后对其进行了重新索引：

df.index = df.index.droplevel(1)
df.reset_index(level=0, inplace=True)
df_sorted = df.sort_values(["A", "C"], ascending=[1,1]) #A is a column here, not an index.
df_reindexed = df_sorted.set_index("A")

仍然非常冗长。

- raummensch

2个回答

1

基于chrisb的代码:

请注意，在我的情况下，这是一个Series而不是DataFrame，

s.groupby(level='A', group_keys=False).apply(lambda x: x.sort_values(ascending=False))

- G. Cheng

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- chrisb · Accepted Answer

感觉可能有更好的方法，但这是一种方案：

In [163]: def sorter(sub_df):
     ...:     sub_df = sub_df.sort_values('C')
     ...:     sub_df.index = sub_df.index.droplevel(0)
     ...:     return sub_df

In [164]: df.groupby(level='A').apply(sorter)
Out[164]: 
                C   D
A   B                
bar two      dull   5
    three  glossy   8
    one     shiny  10
foo one      dull   3
    three    matt  12
    two     shiny   9