这是我的方法:
数据:
counts = [1839,1334,2241,2063,1216,1409,1614,1860,1298,1140,1122,2153,971,1650,1835,889,653,484,2078,1198,426,684,910,701,851,360,763,402,1853,400,1159]
转换为数组:
counts = np.array(counts)
重塑:(感谢https://stackoverflow.com/users/4427777/daniel-f)
def shapeshifter(num_col, my_array=data):
return np.lib.pad(my_array, (0, num_col - len(my_array)
data = shapeshifter(7, counts)
array([[1839, 1334, 2241, 2063, 1216, 1409, 1614],
[1860, 1298, 1140, 1122, 2153, 971, 1650],
[1835, 889, 653, 484, 2078, 1198, 426],
[ 684, 910, 701, 851, 360, 763, 402],
[1853, 400, 1159, 0, 0, 0, 0]])
将值为零的数据转换成 NaN 的数据框:
df = pd.DataFrame(data)
df[df == 0] = np.nan
使用该月的平均值填充缺失值:
df.fillna(counts.mean())
0 1 2 3 4 5 6
0 1839 1334 2241 2063.000000 1216.000000 1409.000000 1614.000000
1 1860 1298 1140 1122.000000 2153.000000 971.000000 1650.000000
2 1835 889 653 484.000000 2078.000000 1198.000000 426.000000
3 684 910 701 851.000000 360.000000 763.000000 402.000000
4 1853 400 1159 1211.483871 1211.483871 1211.483871 1211.483871
按行或按周获取总和:
df.sum(axis=1)
0 11716.0
1 10194.0
2 7563.0
3 4671.0
4 3412.0
dtype: float64
df.days.sub(1)//7
- ansev