假设我有一个带有如下多级索引的数据框:
import pandas as pd
import numpy as np
my_index = pd.MultiIndex.from_product(
[(3,1,2), ("small", "tall", "medium"), ("B", "A", "C")],
names=["number", "size", "letter"]
)
df_0 = pd.DataFrame(np.random.rand(27, 2), columns=["x", "y"], index=my_index)
x y
number size letter
3 small B 0.950073 0.599918
A 0.014450 0.472736
C 0.208064 0.778538
tall B 0.979631 0.367234
A 0.832459 0.449875
C 0.761929 0.053144
medium B 0.460764 0.800131
A 0.355746 0.573813
C 0.078924 0.058865
1 small B 0.405209 0.354636
A 0.536242 0.012904
C 0.458910 0.723627
tall B 0.859898 0.442954
A 0.109729 0.885598
C 0.378363 0.220695
medium B 0.652191 0.685181
A 0.503525 0.400973
C 0.454671 0.188798
2 small B 0.407654 0.168782
A 0.393451 0.083023
C 0.073432 0.165209
tall B 0.678226 0.108497
A 0.718348 0.077935
C 0.595500 0.146271
medium B 0.719985 0.422167
A 0.950950 0.532390
C 0.687721 0.920229
现在我想按不同级别排序索引,首先是数字,然后是大小,最后是字母。
如果我这样做...
df_1 = df_0.sort_index(level=["number", "size", "letter"], inplace=False)
... 课程大小按字母顺序排序。
x y
number size letter
1 medium A 0.503525 0.400973
B 0.652191 0.685181
C 0.454671 0.188798
small A 0.536242 0.012904
B 0.405209 0.354636
C 0.458910 0.723627
tall A 0.109729 0.885598
B 0.859898 0.442954
C 0.378363 0.220695
2 medium A 0.950950 0.532390
B 0.719985 0.422167
C 0.687721 0.920229
small A 0.393451 0.083023
B 0.407654 0.168782
C 0.073432 0.165209
tall A 0.718348 0.077935
B 0.678226 0.108497
C 0.595500 0.146271
3 medium A 0.355746 0.573813
B 0.460764 0.800131
C 0.078924 0.058865
small A 0.014450 0.472736
B 0.950073 0.599918
C 0.208064 0.778538
tall A 0.832459 0.449875
B 0.979631 0.367234
C 0.761929 0.053144
但我希望可以按照自定义键排序。 我知道可以通过自定义排序函数来对尺寸级别进行排序,代码如下:
custom_key = np.vectorize(lambda x: {"small": 0, "medium": 1, "tall": 2}[x])
df_2 = df_0.sort_index(level=1, key=custom_key, inplace=False)
x y
number size letter
1 small A 0.536242 0.012904
B 0.405209 0.354636
C 0.458910 0.723627
2 small A 0.393451 0.083023
B 0.407654 0.168782
C 0.073432 0.165209
3 small A 0.014450 0.472736
B 0.950073 0.599918
C 0.208064 0.778538
1 medium A 0.503525 0.400973
B 0.652191 0.685181
C 0.454671 0.188798
2 medium A 0.950950 0.532390
B 0.719985 0.422167
C 0.687721 0.920229
3 medium A 0.355746 0.573813
B 0.460764 0.800131
C 0.078924 0.058865
1 tall A 0.109729 0.885598
B 0.859898 0.442954
C 0.378363 0.220695
2 tall A 0.718348 0.077935
B 0.678226 0.108497
C 0.595500 0.146271
3 tall A 0.832459 0.449875
B 0.979631 0.367234
C 0.761929 0.053144
但是我怎么能像对待 df_1 一样按所有层级排序,并在第二个级别上使用自定义键呢?
x y
number size letter
1 small A 0.536242 0.012904
B 0.405209 0.354636
C 0.458910 0.723627
medium A 0.503525 0.400973
B 0.652191 0.685181
C 0.454671 0.188798
tall A 0.109729 0.885598
B 0.859898 0.442954
C 0.378363 0.220695
2 small A 0.393451 0.083023
B 0.407654 0.168782
C 0.073432 0.165209
medium A 0.950950 0.532390
B 0.719985 0.422167
C 0.687721 0.920229
tall A 0.718348 0.077935
B 0.678226 0.108497
C 0.595500 0.146271
3 small A 0.014450 0.472736
B 0.950073 0.599918
C 0.208064 0.778538
medium A 0.355746 0.573813
B 0.460764 0.800131
C 0.078924 0.058865
tall A 0.832459 0.449875
B 0.979631 0.367234
C 0.761929 0.053144
我该如何定义自定义键函数,以便我也可以通过名称在sort_index中访问级别?
df_3 = df_0.sort_index(level="size", key=custom_key, inplace=False)
这里会产生一个 KeyError 错误:'找不到级别大小'
index.levels[1].values
比index.get_level_values(1)
更好吗? - william_grisaitis