按组进行分组并引用偏移值

Question

按组进行分组并引用偏移值

3

我正在尝试跟踪单个物品随时间变化的库存水平，比较预计出库和可用性。有时候预计出库超过了可用性，当这种情况发生时，我想要Post Available为0。我正在创建以下Pre Available和Post Available列：

 Item  Week  Inbound  Outbound  Pre Available  Post Available 
 A        1      500       200            500             300 
 A        2        0       400            300               0 
 A        3      100         0            100             100 
 B        1       50        50             50               0 
 B        2        0        80              0               0 
 B        3        0        20              0               0 
 B        4       20        20             20               0

我已经尝试了以下代码：

def custsum(x):

      total = 0
      for i, v in x.iterrows():
         total += df['Inbound'] - df['Outbound']
         x.loc[i, 'Post Available'] = total
         if total < 0:
            total = 0
      return x

df.groupby('Item').apply(custsum)

但是我收到了以下错误信息：

ValueError: Incompatible indexer with Series

我对Python相对陌生，希望能得到帮助。谢谢！

- cpe5

1

请将数据集模拟作为文本复制粘贴，以便我们能够轻松地重现您的DataFrame。 - unutbu

如何将文本添加？当我尝试这样做时，信息似乎看起来很奇怪。 - cpe5

周次进站出站前可用后可用 A 1 500 200 500 300 A 2 0 400 300 0 A 3 100 0 100 0 B 1 50 50 50 0 B 2 0 80 0 0 B 3 0 20 0 0 B 4 20 20 20 0 - cpe5

@Charles 在你的问题类型中添加编辑功能。 - BENY

1

Pre Available 应该是上一周的值，Post Available 加上当前行的 Inbound 值。我还想确保 Post Available 的最小值永远不会低于 0。谢谢！ - cpe5

显示剩余3条评论

2个回答

2

您可以使用：

import numpy as np
import pandas as pd
df = pd.DataFrame({'Inbound': [500, 0, 100, 50, 0, 0, 20],
                   'Item': ['A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'Outbound': [200, 400, 0, 50, 80, 20, 20],
                   'Week': [1, 2, 3, 1, 2, 3, 4]})
df = df[['Item', 'Week', 'Inbound', 'Outbound']]


def custsum(x):
    total = 0
    for i, v in x.iterrows():
        total += x.loc[i, 'Inbound'] - x.loc[i, 'Outbound']
        if total < 0:
            total = 0
        x.loc[i, 'Post Available'] = total
    x['Pre Available'] = x['Post Available'].shift(1).fillna(0) + x['Inbound']
    return x

result = df.groupby('Item').apply(custsum)
result = result[['Item', 'Week', 'Inbound', 'Outbound', 'Pre Available', 'Post Available']]
print(result)

产生的结果是：

  Item  Week  Inbound  Outbound  Pre Available  Post Available
0    A     1      500       200          500.0           300.0
1    A     2        0       400          300.0             0.0
2    A     3      100         0          100.0           100.0
3    B     1       50        50           50.0             0.0
4    B     2        0        80            0.0             0.0
5    B     3        0        20            0.0             0.0
6    B     4       20        20           20.0             0.0

这段代码和你发布的代码之间的主要区别是：

total += x.loc[i, 'Inbound'] - x.loc[i, 'Outbound']

x.loc 用于选择由 i 索引的行中 Inbound 或 Outbound 列中的数字值。因此，差异是数字的，total 仍然是数字。相比之下，

total += df['Inbound'] - df['Outbound']

将整个系列添加到total中。这导致了稍后出现的ValueError。（关于为什么会发生这种情况，见下文）

该条件语句

if total < 0:
    total = 0

为了保证Post Available始终为非负数，x.loc[i, 'Post Available'] = total被移动到上面。

如果您不需要这个条件语句，那么整个for-loop可以被替换为

x['Post Available'] = (df['Inbound'] - df.loc['Outbound']).cumsum()

由于按列进行算术运算和cumsum操作都是矢量化操作，因此计算速度可以更快。不幸的是，条件语句阻止我们消除for循环并对计算进行矢量化。

在你原来的代码中，出现了错误

ValueError: Incompatible indexer with Series

发生在这一行。

x.loc[i, 'Post Available'] = total

因为total有时是Series而不是一个简单的数值。Pandas试图将右侧的Series与左侧的索引器(i, 'Post Available')对齐。索引器(i, 'Post Available')会被转换为一个元组，例如(0, 4)，因为Post Available是索引4处的列。但是，(0, 4)不是适合右侧1维Series的索引。

您可以通过在for-loop内部放置print(total)或注意到右侧的内容来确认total是否为Series。

total += df['Inbound'] - df['Outbound']

这是一个系列。

- unutbu

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- BENY · Accepted Answer

不需要自定义函数，您可以使用 groupby + shift 创建 PreAvailable，并使用 clip（将下边界设置为0）创建 PostAvailable

df['PostAvailable']=(df.Inbound-df.Outbound).clip(lower=0)
df['PreAvailable']=df.groupby('item').apply(lambda x  : x['Inbound'].add(x['PostAvailable'].shift(),fill_value=0)).values
df
Out[213]: 
  item  Week  Inbound  Outbound  PreAvailable  PostAvailable
0    A     1      500       200         500.0            300
1    A     2        0       400         300.0              0
2    A     3      100         0         100.0            100
3    B     1       50        50          50.0              0
4    B     2        0        80           0.0              0
5    B     3        0        20           0.0              0
6    B     4       20        20          20.0              0