Pandas数据框按工作日分组和排序

14

我有一份包含星期几列的pandas数据框。

df_weekday = df.groupby(['Day of Week']).sum()
df_weekday[['Spent', 'Clicks', 'Impressions']].plot(figsize=(16,6), subplots=True);
在数据框中绘制'Day of Week'列并按字母顺序显示:'Friday'、'Monday'、'Saturday'、'Sunday'、'Tuesday'、'Thursday'、'Wednesday'。如何对df_weekday进行排序和显示以正确的工作日顺序显示'Monday'、'Tuesday'、'Wednesday'、'Thursday'、'Friday'、'Saturday'、'Sunday'?
2个回答

36
你可以先使用有序分类,这是链接:ordered catagorical
cats = [ 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

df['Day of Week'] = df['Day of Week'].astype('category', categories=cats, ordered=True)

pandas 0.21.0+ 版本中,请使用:

from pandas.api.types import CategoricalDtype
cat_type = CategoricalDtype(categories=cats, ordered=True)
df['Day of Week'] = df['Day of Week'].astype(cat_type)

或者reindex方法:

df_weekday = df.groupby(['Day of Week']).sum().reindex(cats) 

我使用了“重新索引”解决方案,立即生效了,谢谢。 - Nour
1
太棒了,jezrael的解决方案!!我喜欢它!! - ASH

0
一个简单而健壮的解决方案是在多索引中包含日期数字以进行自动排序。
birthdays = df.groupby([df['date'].dt.day_of_week,df['date'].dt.day_name()])['births'].sum()
birthdays = birthdays.droplevel(0,'index')

关于生日数据的完整示例

# group and sort by day-of-week

import pandas as pd
host = 'raw.github.com'
user = 'fivethirtyeight'
repo = 'data'
branch = 'master'
file = 'births/US_births_2000-2014_SSA.csv'
url = f'https://{host}/{user}/{repo}/{branch}/{file}'
df = pd.read_csv(url,sep=',',header=0)
df['date'] = df[['year','month','date_of_month']].astype(str).apply('-'.join,axis=1)
df['date'] = pd.to_datetime(df['date'])
df = df[['date','births']]
df.head()

import seaborn as sns
birthdays = df.groupby([df['date'].dt.day_of_week,df['date'].dt.day_name()])['births'].sum()
birthdays = birthdays.droplevel(0,'index')
sns.barplot(data=birthdays.reset_index(),x='date',y='births')

enter image description here


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接