将DataFrame附加到多级索引的DataFrame中

Question

将DataFrame附加到多级索引的DataFrame中

3

我有一个包含三个索引的DataFrame，看起来像这样：

                                               stat1             stat2
sample                        env  run                                                  
sample1                       0    0          36.214             71
                                   1          31.808             71
                                   2          28.376             71
                                   3          20.585             71
sample2                       0    0           2.059             29
                                   1           2.070             29
                                   2           2.038             29

这代表着在不同数据样本上运行的过程。该过程在不同环境中多次运行，以确保结果。

听起来可能很简单，但我尝试将一个新的环境结果作为DataFrame添加时遇到了困难：

            stat1          stat2
run                                                  
0           0.686             29
1           0.660             29
2           0.663             29

这应该被索引为 df.loc[["sample1", 1]]。我尝试过：

df.loc[["sample1", 1]] = result

使用 DataFrame.append 可以实现数据合并。但第一种方法会引发 KeyError 错误，第二种方法似乎根本没有修改 DataFrame。

我在这里缺少什么？请注意，在使用类似 df.loc["sample"].append(result) 的方式时，问题在于它会破坏多级索引。它被转换为单个索引，其中前一个多级索引被合并为元组，如 (0, 0) 或 (0, 1) 表示环境 0、运行 1 等；而追加的 DataFrame 的索引（表示每次运行）将成为新的不需要的索引。

- dabadaba

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- sgDysregulation · Answer 1

这里的核心问题在于索引的差异。克服这个问题的一种方法是将结果的索引更改为包括0、1级别设置，然后使用concat将数据帧附加到其上。请参见下面的示例：

In [68]: result.index = list(zip(["sample1"]*len(result), [1]*len(result),result
    ...: .index))

In [69]: df = pd.concat([df,result])
         df
Out[69]: 
                  stat1  stat2
sample  env run               
sample1 0   0    36.214     71
            1    31.808     71
            2    28.376     71
            3    20.585     71
sample2 0   0     2.059     29
            1     2.070     29
            2     2.038     29
sample1 1   0     0.686     29
            1     0.660     29
            2     0.663     29

编辑：一旦索引被更改，您甚至可以使用附加

In [21]: result.index = list(zip(["sample1"]*len(result), [1]*len(result),result
    ...: .index))

In [22]: df.append(result)
Out[22]: 
                  stat1  stat2
sample  env run               
sample1 0   0    36.214     71
            1    31.808     71
            2    28.376     71
            3    20.585     71
sample2 0   0     2.059     29
            1     2.070     29
            2     2.038     29
sample1 1   0     0.686     29
            1     0.660     29
            2     0.663     29