如何对数据框中的值进行求和并将其添加到另一个数据框?

3
我有两个数据框,一个是单独的交易数据框,另一个是账户图表数据框。
我想对每个CompanyKey在上个月(这里是三月)的所有交易进行汇总。然后,我想将此结果作为新列添加到账户图表数据框中,并以CompanyKey作为列标题。
以下是交易数据的小样本(实际上有数千笔交易):
import pandas as pd

df = pd.DataFrame({
        'CompanyKey': ["1","1","1","1","1","1","1","2","2","2"],
        'DateOccurred': ["31/12/2021","25/02/2022","15/03/2022","31/03/2022","31/12/2021","22/02/2022","16/03/2022","31/12/2021","25/02/2022","31/03/2022"],
        'Account.Name': ["Cash at Bank","Cash at Bank","Cash at Bank","Cash at Bank","GST Paid","GST Paid","GST Paid","Cash at Bank","Cash at Bank","Cash at Bank"],
        'Amount': [150,112200,234065,19167.08,-39080.03,-10200,-27.5,15000,-234567,340697]})

这是相应的账户图表:

df1 = pd.DataFrame({
            'ConsolidatedAccountName': ["Cash at Bank","GST Paid", "Cash at Bank", "GST Paid"],
            'Level 1': ["Fund Statement","Fund Statement", "Cash Flow Statement", "Cash Flow Statement"],
            'Level 2': ["Cash at Bank","GST Paid", "Cash at Bank", "GST Paid"]})

这是我期望的结果。我只希望对那些df['Level 1'] == "Fund Statement"的行应用求和操作。
+──────────────────────────+──────────────────────+───────────────+────────────────+────────────────+
| ConsolidatedAccountName  | Level 1              | Level 2       | Company 1 Sum  | Company 2 Sum  |
+──────────────────────────+──────────────────────+───────────────+────────────────+────────────────+
| Cash at Bank             | Fund Statement       | Cash at Bank  | 253,232.08     | 340,697        |
| GST Paid                 | Fund Statement       | GST Paid      | -27.50         | 0              |
| Cash at Bank             | Cash Flow Statement  | Cash at Bank  | NaN            | NaN            |
| GST Paid                 | Cash Flow Statement  | GST Paid      | NaN            | NaN            |
+──────────────────────────+──────────────────────+───────────────+────────────────+────────────────+

在遇到问题之前,我已经实现了大约这么多。

company_keys = [1, 2]
    
for company in company_keys:
    d1['Company 1 Sum'] = np.where((d3['CompanyKey'] == company) &
                                       (d3['DateOccurred'] >= '01/03/2022') & 
                                       (d3['DateOccurred'] <= '31/03/2022') &
                                       (d1['Level 1'] == 'Fund Statement'),
                                        d3['Amount'].sum(),
                                        0)

This is the error I get.

    ValueError: Length of values (10) does not match length of index (4)
1个回答

1
以下是使用Pandas的groupby和apply方法来实现的一种方式:

这里是使用Pandas groupbyapply 的方法:

# Setup
df["DateOccurred"] = pd.to_datetime(df["DateOccurred"], format="%d/%m/%Y")

# Sum transactions per companies and accounts
df_sum = (
    df.loc[df["DateOccurred"].dt.month == 3, :]
    .groupby(["CompanyKey", "Account.Name"])
    .agg({"Amount": sum})
)

# Add new columns
for idx in df["CompanyKey"].unique():
    df1[f"Company {idx} Sum"] = df1.apply(
        lambda x: df_sum.loc[(idx, x["ConsolidatedAccountName"]), "Amount"]
        if (x["ConsolidatedAccountName"] in df_sum.loc[(idx), :].index.unique())
        and (x["Level 1"] == "Fund Statement")
        else None,
        axis=1,
    )

# Cleanup
df1.loc[df1["Level 1"] == "Fund Statement"] = df1.loc[
    df1["Level 1"] == "Fund Statement"
].fillna(0)

然后:

print(df1)
# Output

ConsolidatedAccountName              Level 1       Level 2  Company 1 Sum   Company 2 Sum
0          Cash at Bank       Fund Statement  Cash at Bank      253232.08        340697.0
1              GST Paid       Fund Statement      GST Paid         -27.50             0.0
2          Cash at Bank  Cash Flow Statement  Cash at Bank            NaN             NaN
3              GST Paid  Cash Flow Statement      GST Paid            NaN             NaN

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接