使用 Pandas 数据透视表设置多级索引。

3
我有以下数据。
,DISTRICT,STATUS,DEM,REP,CON,GRE,WOR,IND,WEP,REF,OTH,BLANK
3,1,Active,144470,162308,11778,1108,2066,23245,150.0,41.0,394,128547
4,1,Inactive,10086,8290,643,96,130,1545,5.0,3.0,28,8436
12,2,Active,159400,155996,8457,830,1712,18008,137.0,43.0,291,114789
13,2,Inactive,9440,7752,428,61,119,1043,6.0,3.0,13,6832
24,3,Active,188902,148599,5872,832,1193,19370,144.0,115.0,228,127907
25,3,Inactive,12367,8835,305,68,69,1255,3.0,4.0,16,8315
30,4,Active,204544,160861,4911,888,1286,17455,148.0,173.0,162,113486
31,4,Inactive,15085,10302,315,74,119,1269,4.0,2.0,6,8442
39,5,Active,298385,33244,1468,483,1251,8662,175.0,27.0,43,62106
40,5,Inactive,20071,2273,88,30,126,643,9.0,3.0,2,4380
45,6,Active,198024,52511,1951,592,849,8905,161.0,38.0,83,87250
46,6,Inactive,15016,3699,145,35,77,789,4.0,0.0,9,6769
57,7,Active,271911,22736,803,899,1369,6971,203.0,22.0,113,67437
58,7,Inactive,31143,3355,133,155,219,1166,10.0,1.0,20,9357
66,8,Active,324586,32846,1122,713,1666,8649,202.0,25.0,54,62021
67,8,Inactive,36652,3183,141,109,271,1245,11.0,2.0,20,8419
72,9,Active,313949,30234,1034,823,1474,7952,139.0,21.0,90,62772
73,9,Inactive,33517,3184,119,115,257,1160,8.0,1.0,11,8696
81,10,Active,247623,45926,1128,743,570,9287,128.0,24.0,152,79918
82,10,Inactive,38988,9685,177,143,114,2749,9.0,0.0,57,16564
90,11,Active,183564,110062,5299,654,1362,12817,168.0,67.0,137,88130
91,11,Inactive,16846,7921,409,69,187,1259,5.0,2.0,19,8107
102,12,Active,259877,54431,883,1009,622,13703,128.0,32.0,210,91923
103,12,Inactive,44238,12638,200,244,136,3629,10.0,2.0,87,20458
111,13,Active,327895,16808,738,840,1235,7730,238.0,37.0,82,54421
112,13,Inactive,50766,3815,154,167,304,1816,18.0,1.0,27,11201
120,14,Active,214570,33020,1694,650,993,7544,200.0,28.0,73,61960
121,14,Inactive,21175,3395,241,70,145,946,9.0,0.0,17,6736
126,15,Active,282788,13663,1069,426,1509,5539,284.0,37.0,23,46081
127,15,Inactive,44336,2904,203,53,393,1077,28.0,1.0,12,8909
135,16,Active,251841,62717,4039,584,1140,12216,153.0,72.0,92,76433
136,16,Inactive,29544,6369,369,54,181,1430,6.0,0.0,16,9342
144,17,Active,201872,102969,7728,807,1393,18013,131.0,206.0,97,106682
145,17,Inactive,16792,8085,520,110,134,1658,3.0,3.0,13,10190
159,18,Active,154331,135552,8585,1126,1764,22973,157.0,104.0,350,105443
160,18,Inactive,12972,9506,574,133,196,1995,9.0,4.0,39,9418
195,19,Active,141289,138473,10576,1867,2155,26423,167.0,85.0,360,114937
196,19,Inactive,11503,8552,656,209,245,2350,9.0,4.0,37,10392
213,20,Active,178650,107575,10258,1452,2098,25382,186.0,99.0,295,107500
214,20,Inactive,17920,7738,768,189,353,2570,25.0,1.0,75,11487
252,21,Active,117878,166307,5996,1243,1589,24589,96.0,65.0,201,83284
253,21,Inactive,10560,11493,502,175,245,2646,8.0,8.0,25,10898
279,22,Active,127765,158621,6587,1179,1818,23179,139.0,90.0,377,80746
280,22,Inactive,11143,9472,478,144,262,2183,23.0,11.0,49,8998
315,23,Active,128565,153949,7342,1452,1986,21646,136.0,93.0,357,84173
316,23,Inactive,11331,8808,510,196,232,2032,14.0,8.0,44,9399
330,24,Active,143319,138353,8370,1394,1844,21997,147.0,55.0,521,103922
331,24,Inactive,15962,9745,642,211,325,2496,16.0,6.0,74,12229
336,25,Active,178181,120433,7400,1180,1404,19377,163.0,43.0,607,103153
337,25,Inactive,15180,6838,404,158,146,1533,19.0,3.0,61,8252
345,26,Active,234929,90506,7508,1351,2372,19155,192.0,87.0,444,74621
346,26,Inactive,19306,5621,405,155,284,1602,22.0,7.0,47,8109
372,27,Active,142703,183641,12825,1337,2299,25525,102.0,73.0,627,97801
373,27,Inactive,7283,7959,550,102,165,1433,8.0,0.0,42,6377

这些数据可以在这里查看和下载。

我试图将这个数据框进行透视,使得“DISTRICT”成为索引的第一级,而“BLANK”,“CON”,“DEM”,“GRE”,“IND”,“OTH”,“REF”,“REP”,“WEP”和“WOR”列成为多级索引的第二级。然后我想要将“STATUS”列设置为水平列“Active”和“Inactive”。我编写了以下代码:

active_inactive8.pivot(index=['DISTRICT', 'DEM', 'REP', 'CON', 'GRE',
                              'WOR', 'IND', 'WEP', 'REF', 'OTH', 'BLANK'],
                       columns='STATUS')

但是却出现了错误:ValueError: all arrays must be same length
我也尝试过使用 .melt,但没有成功。
1个回答

1

我认为需要使用set_indexunstack,但接着需要进行转置,swaplevel,最后进行sort_index

df = pd.read_csv('active_inactive8.csv', index_col=0)

df2 = df.set_index(['STATUS','DISTRICT']).unstack().T.swaplevel(1,0).sort_index()

df2 = df.pivot(index='STATUS', columns='DISTRICT').T.swaplevel(1,0).sort_index()


print (df2.head(10))
STATUS            Active  Inactive
DISTRICT                          
1        BLANK  128547.0    8436.0
         CON     11778.0     643.0
         DEM    144470.0   10086.0
         GRE      1108.0      96.0
         IND     23245.0    1545.0
         OTH       394.0      28.0
         REF        41.0       3.0
         REP    162308.0    8290.0
         WEP       150.0       5.0
         WOR      2066.0     130.0

1
这两个选项都可以 - 非常感谢。我会阅读有关.swaplevel()的内容。 - thedatasleuth

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接