如何在 Pandas DataFrame 中迭代遍历时考虑部分行

Question

如何在 Pandas DataFrame 中迭代遍历时考虑部分行

3

考虑这样一个DataFrame：

size = 10
d = {
    'id': np.random.randint(1, 10, size),
    'value': np.random.randint(10, 100, size)
}
df = pd.DataFrame(data=d)

# Now for each row I'm counting how many previous other rows have the same id
df['others_count'] = df.groupby(['id']).cumcount()+1

这将会产生类似于这样的结果：

   id  value  others_count
0   3     76             1
1   4     12             1
2   1     96             1
3   6     33             1
4   4     49             2
5   8     72             1
6   8     68             2
7   7     78             1
8   9     99             1
9   1     66             2

对于那些与至少另一行共享id的行（在我的例子中为4、6和9），我必须添加另一列，其中包含属于该id的所有行上面的value列的平均值。

我提出了这个方案，它非常低效，而且我怀疑也有缺陷：

for row in range(0, df.shape[0]):
    if df['id'][row] > 1:
        address = df['id'][row]
        others = df['others_count'][row]
        df.loc[row, 'value_estimated'] = df.loc[(df['id']==address)&(df['others_count']<others), 'value'].mean()

这将产生以下输出：

   id  value  others_count  value_estimated
0   3     76             1              NaN
1   4     12             1              NaN
2   1     96             1              NaN
3   6     33             1              NaN
4   4     49             2             12.0
5   8     72             1              NaN
6   8     68             2             72.0
7   7     78             1              NaN
8   9     99             1              NaN
9   1     66             2              NaN

对于第四行和第八行的行数是正确的，但对于最后一行不正确，估计值应为96。

您有更好的解决方案吗？

- espogian

如果您正在计算具有相同ID的值的平均值，那么ID 4的平均值不应该是30.5，其他ID也是如此吗？ - NOOB

@NOOB 平均算法应仅考虑其他计数小于所考虑的id的ids。因此，当others_count = 2时，平均值来自单个数字。 - espogian

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- anky · Accepted Answer

如果我理解正确，您可以使用groupby在id上，并使用expandingmean()来向下移动1个值。

最初的回答是：

df['value_estimated']=df.groupby('id')['value'].apply(lambda x: 
                                           x.expanding().mean().shift())
print(df)

   id  value  others_count  value_estimated
0   3     76             1              NaN
1   4     12             1              NaN
2   1     96             1              NaN
3   6     33             1              NaN
4   4     49             2             12.0
5   8     72             1              NaN
6   8     68             2             72.0
7   7     78             1              NaN
8   9     99             1              NaN
9   1     66             2             96.0