如何向Pandas多级索引数据框中添加数据

7
我该如何向 Pandas 多级索引 DataFrame 中添加数据?目前,我使用以下代码成功地从我的数据创建了一个 DataFrame。
df = pd.DataFrame.from_dict(output, orient='index')

我在考虑可能像这样...

df = pd.DataFrame['MMM', 'IncomeStatement'].from_dict(output, orient='index')

DataFrame to merge

                                            0          1          2
Total Revenue                           182795000  170910000  156508000
Cost of Revenue                         112258000  106606000   87846000
Gross Profit                             70537000   64304000   68662000
Research Development                      6041000    4475000    3381000
Selling General and Administrative       11993000   10830000   10040000
Non Recurring                                   0          0          0
Others                                          0          0          0
Total Operating Expenses                        0          0          0
Operating Income or Loss                 52503000   48999000   55241000
Total Other Income/Expenses Net            980000    1156000     522000
Earnings Before Interest And Taxes       53483000   50155000   55763000
Interest Expense                                0          0          0
Income Before Tax                        53483000   50155000   55763000
Income Tax Expense                       13973000   13118000   14030000
Minority Interest                               0          0          0
Net Income From Continuing Ops           39510000   37037000   41733000
Discontinued Operations                         0          0          0
Extraordinary Items                             0          0          0
Effect Of Accounting Changes                    0          0          0
Other Items                                     0          0          0
Net Income                               39510000   37037000   41733000
Preferred Stock And Other Adjustments           0          0          0
Net Income Applicable To Common Shares   39510000   37037000   41733000

Multi-Index / Parent DataFrame

MMM     IncomeStatemen
        BalanceSheet   
        CashFlows      
ABT     IncomeStatement
        BalanceSheet   
        CashFlows      
ABBV    IncomeStatement
        BalanceSheet   
        CashFlows      
ACN     IncomeStatement
        BalanceSheet   
        CashFlows    

结果

MMM     IncomeStatement        Total Revenue                           182795000  170910000  156508000
                               Cost of Revenue                         112258000  106606000   87846000
                               Gross Profit                             70537000   64304000   68662000
                               Research Development                      6041000    4475000    3381000
                               Selling General and Administrative       11993000   10830000   10040000
                               Non Recurring                                   0          0          0
                               Others                                          0          0          0
                               Total Operating Expenses                        0          0          0
                               Operating Income or Loss                 52503000   48999000   55241000
                               Total Other Income/Expenses Net            980000    1156000     522000
                               Earnings Before Interest And Taxes       53483000   50155000   55763000
                               Interest Expense                                0          0          0
                               Income Before Tax                        53483000   50155000   55763000
                               Income Tax Expense                       13973000   13118000   14030000
                               Minority Interest                               0          0          0
                               Net Income From Continuing Ops           39510000   37037000   41733000
                               Discontinued Operations                         0          0          0
                               Extraordinary Items                             0          0          0
                               Effect Of Accounting Changes                    0          0          0
                               Other Items                                     0          0          0
                               Net Income                               39510000   37037000   41733000
                               Preferred Stock And Other Adjustments           0          0          0
                               Net Income Applicable To Common Shares   39510000   37037000   41733000                                       



        BalanceSheet   
        CashFlows      
ABT     IncomeStatement
        BalanceSheet   
        CashFlows      
ABBV    IncomeStatement
        BalanceSheet   
        CashFlows      
ACN     IncomeStatement
        BalanceSheet   
        CashFlows    

DataFrame 最初的样子是什么样子,最终应该是什么样子? - Ami Tavory
@AmiTavory - 我已经包含了一个例子 - Aran Freel
很好,但是这个例子仍然有很多猜测的空间。你的后续问题让情况变得更糟了。抱歉,我会因为其极不清晰而给此问题投反对票。如果你将其编辑成更好的内容,我很乐意撤销投票。 - Ami Tavory
1个回答

3

我将使用简化版本的DataFrame进行翻译。

假设你从以下内容开始:

import pandas as pd
import numpy as np

arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
    np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]

s = pd.DataFrame(index=arrays)

为了让...
>> s
bar one
    two
baz one
    two
foo one
    two
qux one
    two

(这是您的父元素)

还有

c = pd.DataFrame(index=['one', 'two'], data=[23, 33])

为了让......
>> c
    0
one     23
two     33

这是你的第一个数据框(DataFrame)

因此,使用 merge + groupby 可以得到

>> pd.merge(s.reset_index(), c, left_on='level_1', right_index=True).groupby(['level_0', 'level_1']).sum()
        0
level_0     level_1     
bar one     23
    two     33
baz one     23
    two     33
foo one     23
    two     33
qux one     23
    two     33

我使用了fin_data = pd.MultiIndex.from_product(iterables, names=['Ticker', 'Financials'])来创建多重索引,但它没有可用的reset_index... 有什么想法吗? - Aran Freel
哦,你只是在创建一个没有数据框的索引。有很多方法可以从这里开始。最简单的方法是使用你的fin_data作为索引来创建一个数据框。 - Ami Tavory
@AranFreel 不确定什么?pd.DataFrame(index = pd.MultiIndex.from_product([[0,1],['a','b','c']]))。reset_index()可以正常工作。但是,我必须说这是一个非常低效的提问方式。如果你想让别人帮助你,就要花点力气编写一小段代码,以展示问题所在。你的问题中甚至出现了pd.MultiIndex.from_product在哪里?抱歉,但我猜不到你想表达什么。 - Ami Tavory
如果我运行一个for循环,有没有一种方法可以在进行循环的同时添加和创建数据框?我想我可以将每个股票代码附加到数组列表中,然后从那里构建数据框 :-D(顿悟时刻) - Aran Freel
你实际上回答了我的问题,我只是多索引代码有误。 - Aran Freel

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接