使用浮点数索引合并两个pandas数据帧

Question

使用浮点数索引合并两个pandas数据帧

3

我有两个DataFrame，它们具有不同的列，但我想通过将它们在行上对齐来合并它们。也就是说，假设我有这两个数据框：

df1 = pd.DataFrame(np.arange(12).reshape(6, 2), index=np.arange(6)*0.1, columns=['a', 'b'])

df1
      a   b
0.0   0   1
0.1   2   3
0.2   4   5
0.3   6   7
0.4   8   9
0.5  10  11

df2 = pd.DataFrame(np.arange(8).reshape(4, 2), index=[0.07, 0.21, 0.43, 0.54], columns=['c', 'd'])

df2
      c  d
0.07  0  1
0.21  2  3
0.43  4  5
0.54  6  7

我希望将df2与df1合并，使df2的行与从df1中最近邻的索引对齐。最终结果如下：

      a   b   c    d
0.0   0   1   NaN  NaN
0.1   2   3   0    1
0.2   4   5   2    3
0.3   6   7   NaN  NaN
0.4   8   9   4    5
0.5  10  11   6    7

我希望您能提出任何关于如何高效解决这个问题的想法。

- Gerges

1

df1 是否保证每个0.1增量都有一行？如果是这样，您可以设置 df2.index = df2.index.round(1) 然后直接进行连接。 - Matthias Fripp

2个回答

2

自从您提到关闭以来，它一直是我们IT技术中的一个重要主题。

df2.index=[min(df1.index, key=lambda x:abs(x-y)) for y in df2.index]
pd.concat([df1,df2],1)
Out[535]: 
      a   b    c    d
0.0   0   1  NaN  NaN
0.1   2   3  0.0  1.0
0.2   4   5  2.0  3.0
0.3   6   7  NaN  NaN
0.4   8   9  4.0  5.0
0.5  10  11  6.0  7.0

- BENY

谢谢！我接受了这个答案，因为它直接比较了两个索引，而不是四舍五入。但是，如果你最终得到重复的索引（这似乎是我在实际应用中得到的结果），那么这种方法就行不通了。我认为结合 @Paul H 的答案可以解决这个问题，但我无法做到。现在，在 pd.concat 之前，我先删除重复项。 - Gerges

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Paul H · Accepted Answer

我会暂时重新定义df2的索引，使其变为实际索引的四舍五入版本：

merged = (
    df2.assign(idx=np.round(df2.index, 1)) # compute the rounded index
       .reset_index(drop=True)             # drop the existing index 
       .set_index('idx')                   # new, rounded index
       .join(df1, how='right')             # right join 
       .sort_index(axis='columns')         # sort the columns
)

然后我会得到：

      a   b    c    d
0.0   0   1  NaN  NaN
0.1   2   3  0.0  1.0
0.2   4   5  2.0  3.0
0.3   6   7  NaN  NaN
0.4   8   9  4.0  5.0
0.5  10  11  6.0  7.0