Pandas按7天分组

Question

Pandas按7天分组

3

如何按周汇总我的数据计数，并且如果上周仍未完成，则计算平均值“归一化”，假设这是我的列表。

days = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]
counts = [1839,1334,2241,2063,1216,1409,1614,1860,1298,1140,1122,2153,971,1650,1835,889,653,484,2078,1198,426,684,910,701,851,360,763,402,1853,400,1159]

谢谢

- Islam Fahmy

3个回答

1

我不知道如何在这种情况下使用 pandas，但我会使用内置的Python模块来完成：

from collections import defaultdict
from statistics import mean
days = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]
counts = [1839,1334,2241,2063,1216,1409,1614,1860,1298,1140,1122,2153,971,1650,1835,889,653,484,2078,1198,426,684,910,701,851,360,763,402,1853,400,1159]
weeks = [d//7 for d in days]
avg_count = int(mean(counts))
weeks = weeks + [weeks[-1]]*(len(weeks)%7)  # pad weeks to multiply of 7
counts = counts + [avg_count]*(len(counts)%7)  # pad counts to multiply of 7
count_per_week = defaultdict(int)
for w, c in zip(weeks, counts):
    count_per_week[w] += c
print(dict(count_per_week))

输出：

{0: 10102, 1: 10158, 2: 8787, 3: 4695, 4: 3814}

请注意，我假设平均值是一个合理的填充值，这并不总是正确的。当使用defaultdict(int)来获取不存在的键时，它会将该键的值设置为int()即0。

- Daweo

0

这是我的方法：

数据：

counts = [1839,1334,2241,2063,1216,1409,1614,1860,1298,1140,1122,2153,971,1650,1835,889,653,484,2078,1198,426,684,910,701,851,360,763,402,1853,400,1159]

转换为数组：

counts = np.array(counts)

重塑：（感谢https://stackoverflow.com/users/4427777/daniel-f）

def shapeshifter(num_col, my_array=data):
    return np.lib.pad(my_array, (0, num_col - len(my_array) % num_col), 'constant',  constant_values = 0).reshape(-1, num_col)

data = shapeshifter(7, counts)


array([[1839, 1334, 2241, 2063, 1216, 1409, 1614],
       [1860, 1298, 1140, 1122, 2153,  971, 1650],
       [1835,  889,  653,  484, 2078, 1198,  426],
       [ 684,  910,  701,  851,  360,  763,  402],
       [1853,  400, 1159,    0,    0,    0,    0]])

将值为零的数据转换成 NaN 的数据框：

df = pd.DataFrame(data)
df[df == 0] = np.nan

使用该月的平均值填充缺失值：

df.fillna(counts.mean())

    0   1   2   3   4   5   6
0   1839    1334    2241    2063.000000     1216.000000     1409.000000     1614.000000
1   1860    1298    1140    1122.000000     2153.000000     971.000000  1650.000000
2   1835    889     653     484.000000  2078.000000     1198.000000     426.000000
3   684     910     701     851.000000  360.000000  763.000000  402.000000
4   1853    400     1159    1211.483871     1211.483871     1211.483871     1211.483871

按行或按周获取总和：

df.sum(axis=1)

0    11716.0
1    10194.0
2     7563.0
3     4671.0
4     3412.0
dtype: float64

- Nand0san

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Poe Dator · Accepted Answer

这里有一个使用Pandas的解决方案：

1）创建数据框：

df = pd.DataFrame({'days':days,'counts': counts})
df['week'] = df.days.sub(1)//7 # adding week column

2) 按周计算总和和平均值，然后生成标准化的总和：

d2 = df.groupby('week').agg({'counts':['sum','mean']}) # ca
d2['norm_sum'] = d2[('counts','mean')] * 7

3) 输出:

print (d2)

     counts                   norm_sum
        sum         mean              
week                                  
0     10102  1683.666667  11785.666667
1     10158  1451.142857  10158.000000
2      8787  1255.285714   8787.000000
3      4695   670.714286   4695.000000
4      3814   953.500000   6674.500000