我认为你可以将天数转换成分类,这样如果使用groupby + mean
,就会对缺失的分类得到NaN
:
df = pd.DataFrame({
'day': ['Monday','Tuesday','Tuesday','Tuesday','Thursday'],
'price': list(range(5))
})
print (df)
day price
0 Monday 0
1 Tuesday 1
2 Tuesday 2
3 Tuesday 3
4 Thursday 4
cats = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
df['day'] = pd.Categorical(df['day'], categories=cats, ordered=True)
print(df.groupby("day", as_index=False).price.mean())
day price
0 Monday 0.0
1 Tuesday 2.0
2 Wednesday NaN
3 Thursday 4.0
4 Friday NaN
5 Saturday NaN
6 Sunday NaN
另一种解决方案是通过所有可能的类别进行重新索引
:
cats = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
print(df.groupby("day").price.mean().reindex(cats))
day
Monday 0.0
Tuesday 2.0
Wednesday NaN
Thursday 4.0
Friday NaN
Saturday NaN
Sunday NaN
Name: price, dtype: float64
print(df.groupby("day").price.mean().reindex(cats, fill_value=0))
day
Monday 0
Tuesday 2
Wednesday 0
Thursday 4
Friday 0
Saturday 0
Sunday 0
Name: price, dtype: int64