Pandas透视表中的排序

5

我一直在努力寻找如何正确排序我的数据透视表,但是一直没有成功。

    client          unit    task                hours   month
0   A               DVADA   Account Management  6.50    January     
1   A               DVADA   Buying              1.25    January 
2   A               DVADA   Meeting / Call      0.50    January 
3   A               DVADA   Account Management  3.00    January 
4   A               DVADA   Billing             2.50    February    
5   A               DVADA   Account Management  6.50    February        
6   A               DVADA   Buying              1.25    February    
7   A               DVADA   Meeting / Call      0.50    February    
8   A               DVADA   Account Management  3.00    February    
9   A               DVADA   Billing             2.50    February
10  A               DVADA   Billing             2.50    December    
11  A               DVADA   Account Management  6.50    December        
12  A               DVADA   Buying              1.25    December    
13  A               DVADA   Meeting / Call      0.50    December    
14  A               DVADA   Account Management  3.00    December    
15  A               DVADA   Billing             2.50    December
16  A               DVADA   Account Management  6.50    August      
17  A               DVADA   Buying              1.25    August  
18  A               DVADA   Meeting / Call      0.50    August  
19  A               DVADA   Account Management  3.00    August
20  A               DVADA   Account Management  6.50    April       
21  A               DVADA   Buying              1.25    April   
22  A               DVADA   Meeting / Call      0.50    April   
23  A               DVADA   Account Management  3.00    April
24  B               DVADA   Account Management  6.50    January     
25  B               DVADA   Buying              1.25    January 
26  B               DVADA   Meeting / Call      0.50    January 
27  B               DVADA   Account Management  3.00    January 
28  B               DVADA   Billing             2.50    February    
29  B               DVADA   Account Management  6.50    February        
30  B               DVADA   Buying              1.25    February    
31  B               DVADA   Meeting / Call      0.50    February    
32  B               DVADA   Account Management  3.00    February    
33  B               DVADA   Billing             2.50    February
34  B               DVADA   Billing             2.50    December    
35  B               DVADA   Account Management  6.50    December        
36  B               DVADA   Buying              1.25    December    
37  B               DVADA   Meeting / Call      0.50    December    
38  B               DVADA   Account Management  3.00    December    
39  B               DVADA   Billing             2.50    December
40  B               DVADA   Account Management  6.50    August      
41  B               DVADA   Buying              1.25    August  
42  B               DVADA   Meeting / Call      0.50    August  
43  B               DVADA   Account Management  3.00    August
44  B               DVADA   Account Management  6.50    April       
45  B               DVADA   Buying              1.25    April   
46  B               DVADA   Meeting / Call      0.50    April   
47  C               DVADA   Account Management  3.00    April
48  C               DVADA   Account Management  6.50    January     
49  C               DVADA   Buying              1.25    January 
50  C               DVADA   Meeting / Call      0.50    January 
51  C               DVADA   Account Management  3.00    January 
52  C               DVADA   Billing             2.50    February    
53  C               DVADA   Account Management  6.50    February        
54  C               DVADA   Buying              1.25    February    
55  C               DVADA   Meeting / Call      0.50    February    
56  C               DVADA   Account Management  3.00    February    
57  C               DVADA   Billing             2.50    February
58  C               DVADA   Billing             2.50    December    
59  C               DVADA   Account Management  6.50    December        
60  C               DVADA   Buying              1.25    December    
61  C               DVADA   Meeting / Call      0.50    December    
62  C               DVADA   Account Management  3.00    December    
63  C               DVADA   Billing             2.50    December
64  C               DVADA   Account Management  6.50    August      
65  C               DVADA   Buying              1.25    August  
66  C               DVADA   Meeting / Call      0.50    August  
67  C               DVADA   Account Management  3.00    August
68  C               DVADA   Account Management  6.50    April       
69  C               DVADA   Buying              1.25    April   
70  C               DVADA   Meeting / Call      0.50    April   
71  C               DVADA   Account Management  3.00    April

df = pd.pivot_table(vp_clients, values='hours', index=['client', 'month'], aggfunc=sum)

这段代码返回一个数据透视表,包含三列(client,month,hours)。每个客户有12个月份(1月至12月),每个月份都有对应的工作小时数。

                        hours
client          month

A               April   203.50
                August  227.75
                December 159.75
                February 203.25
                January 199.25

B               April   203.50
                August  227.75
                December 159.75
                February 203.25
                January 199.25

C               April   203.50
                August  227.75
                December 159.75
                February 203.25
                January 199.25

我希望你能帮我将这个数据透视表按月份排序,但同时保留客户列的数据不变。
                           hours
client           month

A               January 203.50
                February 227.75
                March    159.75
                April    203.25
                May     199.90

B               January 203.50
                February 227.75
                March    159.75
                April    203.25
                May     199.90

C               January 203.50
                February 227.75
                March    159.75
                April    203.25
                May     199.90

下面是Scott提供的解决方案,已经修复了排序问题。现在我想为每个客户添加一行,显示使用的总小时数。

                           hours
client           month

A               January    203.50
                February   227.75
                March      159.75
                April      203.25
                May        199.90
                Total     1000.34

B               January    203.50
                February   227.75
                March      159.75
                April      203.25
                May       199.90
                Total     1000.34

C               January   203.50
                February   227.75
                March      159.75
                April      203.25
                May       199.90
                Total     1000.34

任何帮助都将不胜感激。

1
请展示您的数据和输出。在没有的情况下使用pivot_table是奇怪的。只需使用groupby即可。 - Parfait
我在我的数据中确实有列 @Parfait - Timothy Mcwilliams
@Parfait,我会尝试使用“groupby”,谢谢! - Timothy Mcwilliams
准备一些数据,让我们帮助您。 :) - Scott Boston
@ScottBoston,好主意! :) 我会为你们制作一些。 - Timothy Mcwilliams
显示剩余4条评论
2个回答

5

更新:在每个客户端末尾添加总计

vp_clients['month'] = pd.Categorical(vp_clients['month'], 
                                     ordered=True, 
                                     categories=['January','February','March',
                                                 'April','May','June','July',
                                                 'August','September','October',
                                                 'November','December','Total'])

df = pd.pivot_table(vp_clients, values='hours', index=['client', 'month'], aggfunc=sum)

df = df.dropna()

pd.concat([df,df.sum(level=0).assign(month='Total').set_index('month', append=True)]).sort_index()

输出:

                 hours
client month          
A      January   11.25
       February  16.25
       April     11.25
       August    11.25
       December  16.25
       Total     66.25
B      January   11.25
       February  16.25
       April      8.25
       August    11.25
       December  16.25
       Total     63.25
C      January   11.25
       February  16.25
       April     14.25
       August    11.25
       December  16.25
       Total     69.25

让我们使用 pd.Categorical:
vp_clients['month'] = pd.Categorical(vp_clients['month'], 
                                     ordered=True, 
                                     categories=['January','February','March',
                                                 'April','May','June','July',
                                                 'August','September','October',
                                                 'November','December'])

df = pd.pivot_table(vp_clients, values='hours', index=['client', 'month'], aggfunc=sum)

df.dropna()

输出:

                 hours
client month          
A      January   11.25
       February  16.25
       April     11.25
       August    11.25
       December  16.25
B      January   11.25
       February  16.25
       April      8.25
       August    11.25
       December  16.25
C      January   11.25
       February  16.25
       April     14.25
       August    11.25
       December  16.25

成功了!非常感谢!!:) - Timothy Mcwilliams
1
这是一种方法,另一种方法是将这些月份转换为日期时间数据类型,让日期时间对结果进行排序。 - Scott Boston
愉快编程。谢谢。 - Scott Boston
你知道如何在每个客户端的末尾添加一个名为“总计”的行,该行是小时数的总和吗? - Timothy Mcwilliams

1
此外,正如评论所述,由于您未将值旋转到宽格式的新列中,请考虑简单地使用groupby()。并重新考虑reindex()以获取自定义1月至12月顺序,指定level并与Python内置的calendar模块进行交互。
import calendar
...

grp_df = df.groupby(['client', 'month']).agg({'hours': 'sum'})\
           .reindex(level=1, labels=calendar.month_name)

#                  hours
# client month          
# A      January   11.25
#        February  16.25
#        April     11.25
#        August    11.25
#        December  16.25
# B      January   11.25
#        February  16.25
#        April      8.25
#        August    11.25
#        December  16.25
# C      January   11.25
#        February  16.25
#        April     14.25
#        August    11.25
#        December  16.25

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接