Pandas透视表保留顺序。

Question

Pandas透视表保留顺序。

11

>>> df
   A   B   C      D
0  foo one small  1
1  foo one large  2
2  foo one large  2
3  foo two small  3
4  foo two small  3
5  bar one large  4
6  bar one small  5
7  bar two small  6
8  bar two large  7
>>> table = pivot_table(df, values='D', index=['A', 'B'],
...                     columns=['C'], aggfunc=np.sum)
>>> table
          small  large
foo  one  1      4
     two  6      NaN
bar  one  5      4
     two  6      7

我希望输出结果与上面显示的一样，但是我得到了一个排序后的输出结果。 bar出现在foo上面，以此类推。

- Rahul Ranjan

3个回答

10

自 pandas 1.3.0 版本起，可以在 pd.pivot_table 中指定 sort=False 参数：

>>> import pandas as pd
>>> df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar"],
...                    "B": ["one", "one", "one", "two", "two", "one", "one", "two", "two"],
...                    "C": ["small", "large", "large", "small","small", "large", "small", "small", "large"],
...                    "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
...                    "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
>>> pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'],
...                aggfunc='sum', sort=False)
C        large  small
A   B                
foo one    4.0    1.0
    two    NaN    6.0
bar one    4.0    5.0
    two    7.0    6.0

- Eric Duminil

2

谢谢你提供这个答案，Eric。它很有用。 - LunkRat

3

在创建pivot_table时，索引会自动按字母顺序排序。不仅foo和bar，您还可以注意到small和large也被排序了。如果您想让foo在顶部，您可能需要再次使用sortlevel进行排序。如果您希望输出与此处示例相同的结果，则可能需要同时对A和C进行排序。

table.sortlevel(["A","B"], ascending= [False,True], sort_remaining=False, inplace=True)
table.sortlevel(["C"], axis=1, ascending=False,  sort_remaining=False, inplace=True)
print(table)

输出：

C        small  large
A   B                
foo one  1.0    4.0  
    two  6.0    NaN   
bar one  5.0    4.0  
    two  6.0    7.0

更新：

要删除索引名 A、B 和 C，请执行以下操作：

table.columns.name = None
table.index.names = (None, None)

- niraj

如何从上述解决方案中删除 C A B？小大 foo one 1 4 two 6 NaN bar one 5 4 two 6 7 - Rahul Ranjan

对于索引，有多级即你有 A 和 B，所以你需要使用 index.names。你可以查看 https://dev59.com/6Irda4cB1Zd3GeqPSOF0#30254337 。我看到你提到你是初学者，所以最好的方法是尝试一下，例如：table.index 返回什么，table.columns.name 返回什么...... - niraj

对于如何删除索引标签的建议（如果使用None得到NaN，则只需使用空字符串），+1分（显然，事后看来）。 - RobM

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- ayhan · Accepted Answer

我认为pivot_table没有排序选项，但groupby有：

df.groupby(['A', 'B', 'C'], sort=False)['D'].sum().unstack('C')
Out: 
C        small  large
A   B                
foo one    1.0    4.0
    two    6.0    NaN
bar one    5.0    4.0
    two    6.0    7.0

你可以通过将分组列传递给groupby函数，并将你想要显示为列值的列使用unstack函数来实现。如果你不想要索引名称，可以将它们重命名为None。

df.groupby(['A', 'B', 'C'], sort=False)['D'].sum().rename_axis([None, None, None]).unstack(level=2)
Out: 
         small  large
foo one    1.0    4.0
    two    6.0    NaN
bar one    5.0    4.0
    two    6.0    7.0