使用 Pandas 在每行迭代时间序列数据时计算当前行之前的总和

3
假设我有以下代码,用于计算在我的预算内可以购买多少产品 -
import math
import pandas as pd

data = [['2021-01-02', 5.5], ['2021-02-02', 10.5], ['2021-03-02', 15.0], ['2021-04-02', 20.0]]
df = pd.DataFrame(data, columns=['Date', 'Current_Price'])

df.Date = pd.to_datetime(df.Date)
mn = df.Date.min()
mx = df.Date.max()
dr = pd.date_range(mn - pd.tseries.offsets.MonthBegin(), mx + pd.tseries.offsets.MonthEnd(), name="Date")
df = df.set_index("Date").reindex(dr).reset_index()
df['Current_Price'] = df.groupby(
    pd.Grouper(key='Date', freq='1M'))['Current_Price'].ffill().bfill()

# The dataframe below shows the current price of the product
# I'd like to buy at the specific date_range
print(df)

# Create 'Day' column to know which day of the month
df['Day'] = pd.to_datetime(df['Date']).dt.day

# Create 'Deposit' column to record how much money is
# deposited in, say, my bank account to buy the product.
# 'Withdrawal' column is to record how much I spent in
# buying product(s) at the current price on a specific date.
# 'Num_of_Products_Bought' shows how many items I bought
# on that specific date.
#
# Please note that the calculate below takes into account
# the left over money, which remains after I've purchased a 
# product, for future purchase. For example, if you observe 
# the resulting dataframe at the end of this code, you'll 
# notice that I was able to purchase 7 products on March 1, 2021
# although my deposit on that day was $100. That is because 
# on the days leading up to March 1, 2021, I have been saving 
# the spare change from previous product purchases and that 
# extra money allows me to buy an extra product on March 1, 2021 
# despite my budget of $100 should only allow me to purchase 
# 6 products.
df[['Deposit', 'Withdrawal', 'Num_of_Products_Bought']] = 0.0

# Suppose I save $100 at the beginning of every month in my bank account
df.loc[df['Day'] == 1, 'Deposit'] = 100.0

for index, row in df.iterrows():
    if df.loc[index, 'Day'] == 1:
        # num_prod_bought = (sum_of_deposit_so_far - sum_of_withdrawal)/current_price
        df.loc[index, 'Num_of_Products_Bought'] = math.floor(
            (sum(df.iloc[0:(index + 1)]['Deposit'])
             - sum(df.iloc[0:(index + 1)]['Withdrawal']))
            / df.loc[index, 'Current_Price'])
        # Record how much I spent buying the product on specific date
        df.loc[index, 'Withdrawal'] = df.loc[index, 'Num_of_Products_Bought'] * df.loc[index, 'Current_Price']

print(df)
# This code above is working as intended,
# but how can I make it more efficient/pandas-like?
# In particular, I don't like to idea of having to
# iterate the rows and having to recalculate
# the running (sum of) deposit amount and
# the running (sum of) the withdrawal.

如代码中所述,我想知道如何在不逐行迭代并计算当前迭代行之前所有行的总和的情况下完成相同的操作(我在StackOverflow上读到了cumsum()函数,但我不认为cumsum有迭代过程中当前行的概念)。

非常感谢您提前提出的建议/答案!


1
我有点困惑,因为你说你的当前代码按预期工作,但你提到了一个运行总和。你的输出目前只填充了day==1的行,其他所有行都是0。你是想将其扩展到多天吗?还是你想以更pandas的方式生成完全相同的输出? - Henry Ecker
嗨@HenryEcker,如果我的问题不清楚,我很抱歉。我希望找到一种更pandas的方法来完成上面代码中实现的相同功能。基本上,我担心我的代码效率不够高,因为我正在遍历行,并且对于每一行,我都在重新计算“存款”和“提款”之前或之前的总和。希望有人能分享一种模式(如果有的话),可以在pandas中实现类似的功能。谢谢! - user1330974
1
你的实现中第59行说你买了7个,花费了105美元,这会让你亏钱。我的实现产生了6个项目,总共90美元。难道你不想要购买总额少于你拥有的金额吗? - Henry Ecker
@HenryEcker 我应该表述得更清楚。我的实现购买了7个产品(价值105美元),因为它计算了我一直在存储的额外零钱。换句话说,在2021年3月1日之前的几天里,我花费的金额少于存款金额,这些额外的钱允许我在2021年3月1日购买一个额外的产品。我很快会更新我的问题以使其更清晰。有没有一种方法可以在pandas中实现这一点,而不必逐行迭代并为每一行计算“sum”? - user1330974
您可以定义自己的函数,应用于数据帧中的每一行,然后使用df.apply(function)函数。 - flurble
1个回答

1
一种使用 .apply 的解决方案:
def fn():
    leftover = 0
    amount, deposit = yield
    while True:
        new_amount, new_deposit = yield (deposit + leftover) // amount
        leftover = (deposit + leftover) % amount
        amount, deposit = new_amount, new_deposit


df = df.set_index("Date")
s = fn()
next(s)
m = df.index.day == 1
df.loc[m, "Deposit"] = 100
df.loc[m, "Num_of_Products_Bought"] = df.loc[
    m, ["Current_Price", "Deposit"]
].apply(lambda x: s.send((x["Current_Price"], x["Deposit"])), axis=1)
df.loc[m, "Withdrawal"] = (
    df.loc[m, "Num_of_Products_Bought"] * df.loc[m, "Current_Price"]
)
print(df.fillna(0).reset_index())

输出:

          Date  Current_Price  Deposit  Num_of_Products_Bought  Withdrawal
0   2021-01-01            5.5    100.0                    18.0        99.0
1   2021-01-02            5.5      0.0                     0.0         0.0
2   2021-01-03            5.5      0.0                     0.0         0.0
3   2021-01-04            5.5      0.0                     0.0         0.0
4   2021-01-05            5.5      0.0                     0.0         0.0
5   2021-01-06            5.5      0.0                     0.0         0.0
6   2021-01-07            5.5      0.0                     0.0         0.0
7   2021-01-08            5.5      0.0                     0.0         0.0
8   2021-01-09            5.5      0.0                     0.0         0.0
9   2021-01-10            5.5      0.0                     0.0         0.0
10  2021-01-11            5.5      0.0                     0.0         0.0
11  2021-01-12            5.5      0.0                     0.0         0.0
12  2021-01-13            5.5      0.0                     0.0         0.0
13  2021-01-14            5.5      0.0                     0.0         0.0
14  2021-01-15            5.5      0.0                     0.0         0.0
15  2021-01-16            5.5      0.0                     0.0         0.0
16  2021-01-17            5.5      0.0                     0.0         0.0
17  2021-01-18            5.5      0.0                     0.0         0.0
18  2021-01-19            5.5      0.0                     0.0         0.0
19  2021-01-20            5.5      0.0                     0.0         0.0
20  2021-01-21            5.5      0.0                     0.0         0.0
21  2021-01-22            5.5      0.0                     0.0         0.0
22  2021-01-23            5.5      0.0                     0.0         0.0
23  2021-01-24            5.5      0.0                     0.0         0.0
24  2021-01-25            5.5      0.0                     0.0         0.0
25  2021-01-26            5.5      0.0                     0.0         0.0
26  2021-01-27            5.5      0.0                     0.0         0.0
27  2021-01-28            5.5      0.0                     0.0         0.0
28  2021-01-29            5.5      0.0                     0.0         0.0
29  2021-01-30            5.5      0.0                     0.0         0.0
30  2021-01-31            5.5      0.0                     0.0         0.0
31  2021-02-01           10.5    100.0                     9.0        94.5
32  2021-02-02           10.5      0.0                     0.0         0.0
33  2021-02-03           10.5      0.0                     0.0         0.0
34  2021-02-04           10.5      0.0                     0.0         0.0
35  2021-02-05           10.5      0.0                     0.0         0.0
36  2021-02-06           10.5      0.0                     0.0         0.0
37  2021-02-07           10.5      0.0                     0.0         0.0
38  2021-02-08           10.5      0.0                     0.0         0.0
39  2021-02-09           10.5      0.0                     0.0         0.0
40  2021-02-10           10.5      0.0                     0.0         0.0
41  2021-02-11           10.5      0.0                     0.0         0.0
42  2021-02-12           10.5      0.0                     0.0         0.0
43  2021-02-13           10.5      0.0                     0.0         0.0
44  2021-02-14           10.5      0.0                     0.0         0.0
45  2021-02-15           10.5      0.0                     0.0         0.0
46  2021-02-16           10.5      0.0                     0.0         0.0
47  2021-02-17           10.5      0.0                     0.0         0.0
48  2021-02-18           10.5      0.0                     0.0         0.0
49  2021-02-19           10.5      0.0                     0.0         0.0
50  2021-02-20           10.5      0.0                     0.0         0.0
51  2021-02-21           10.5      0.0                     0.0         0.0
52  2021-02-22           10.5      0.0                     0.0         0.0
53  2021-02-23           10.5      0.0                     0.0         0.0
54  2021-02-24           10.5      0.0                     0.0         0.0
55  2021-02-25           10.5      0.0                     0.0         0.0
56  2021-02-26           10.5      0.0                     0.0         0.0
57  2021-02-27           10.5      0.0                     0.0         0.0
58  2021-02-28           10.5      0.0                     0.0         0.0
59  2021-03-01           15.0    100.0                     7.0       105.0
60  2021-03-02           15.0      0.0                     0.0         0.0
61  2021-03-03           15.0      0.0                     0.0         0.0
62  2021-03-04           15.0      0.0                     0.0         0.0
63  2021-03-05           15.0      0.0                     0.0         0.0
64  2021-03-06           15.0      0.0                     0.0         0.0
65  2021-03-07           15.0      0.0                     0.0         0.0
66  2021-03-08           15.0      0.0                     0.0         0.0
67  2021-03-09           15.0      0.0                     0.0         0.0
68  2021-03-10           15.0      0.0                     0.0         0.0
69  2021-03-11           15.0      0.0                     0.0         0.0
70  2021-03-12           15.0      0.0                     0.0         0.0
71  2021-03-13           15.0      0.0                     0.0         0.0
72  2021-03-14           15.0      0.0                     0.0         0.0
73  2021-03-15           15.0      0.0                     0.0         0.0
74  2021-03-16           15.0      0.0                     0.0         0.0
75  2021-03-17           15.0      0.0                     0.0         0.0
76  2021-03-18           15.0      0.0                     0.0         0.0
77  2021-03-19           15.0      0.0                     0.0         0.0
78  2021-03-20           15.0      0.0                     0.0         0.0
79  2021-03-21           15.0      0.0                     0.0         0.0
80  2021-03-22           15.0      0.0                     0.0         0.0
81  2021-03-23           15.0      0.0                     0.0         0.0
82  2021-03-24           15.0      0.0                     0.0         0.0
83  2021-03-25           15.0      0.0                     0.0         0.0
84  2021-03-26           15.0      0.0                     0.0         0.0
85  2021-03-27           15.0      0.0                     0.0         0.0
86  2021-03-28           15.0      0.0                     0.0         0.0
87  2021-03-29           15.0      0.0                     0.0         0.0
88  2021-03-30           15.0      0.0                     0.0         0.0
89  2021-03-31           15.0      0.0                     0.0         0.0
90  2021-04-01           20.0    100.0                     5.0       100.0
91  2021-04-02           20.0      0.0                     0.0         0.0
92  2021-04-03           20.0      0.0                     0.0         0.0
93  2021-04-04           20.0      0.0                     0.0         0.0
94  2021-04-05           20.0      0.0                     0.0         0.0
95  2021-04-06           20.0      0.0                     0.0         0.0
96  2021-04-07           20.0      0.0                     0.0         0.0
97  2021-04-08           20.0      0.0                     0.0         0.0
98  2021-04-09           20.0      0.0                     0.0         0.0
99  2021-04-10           20.0      0.0                     0.0         0.0
100 2021-04-11           20.0      0.0                     0.0         0.0
101 2021-04-12           20.0      0.0                     0.0         0.0
102 2021-04-13           20.0      0.0                     0.0         0.0
103 2021-04-14           20.0      0.0                     0.0         0.0
104 2021-04-15           20.0      0.0                     0.0         0.0
105 2021-04-16           20.0      0.0                     0.0         0.0
106 2021-04-17           20.0      0.0                     0.0         0.0
107 2021-04-18           20.0      0.0                     0.0         0.0
108 2021-04-19           20.0      0.0                     0.0         0.0
109 2021-04-20           20.0      0.0                     0.0         0.0
110 2021-04-21           20.0      0.0                     0.0         0.0
111 2021-04-22           20.0      0.0                     0.0         0.0
112 2021-04-23           20.0      0.0                     0.0         0.0
113 2021-04-24           20.0      0.0                     0.0         0.0
114 2021-04-25           20.0      0.0                     0.0         0.0
115 2021-04-26           20.0      0.0                     0.0         0.0
116 2021-04-27           20.0      0.0                     0.0         0.0
117 2021-04-28           20.0      0.0                     0.0         0.0
118 2021-04-29           20.0      0.0                     0.0         0.0
119 2021-04-30           20.0      0.0                     0.0         0.0

1
你的解决方案很好!虽然我还需要一点时间才能立即掌握它(我相信在我将这种方法应用于更多的解决方案之后,它会变得更加容易理解),但我从你的答案中学会了如何使用yield。谢谢! - user1330974

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接