将多个DataFrame列合并为一个

Question

将多个DataFrame列合并为一个

3

我正在尝试转换一个具有动态数量的a_P列的DataFrame，看起来像这样：

             a1_P       a2_P     weight  
0        33297.81   17407.93   14733.23  
1        58895.18   43013.57   86954.04

转换为新的DataFrame，看起来像这样（按P排序）

                P     weight  
0        17407.93   14733.23
1        33297.81   14733.23  
2        43013.57   86954.04
3        58895.18   86954.04

目前我尝试的是

names = ["a1", "a2"]
p = pd.DataFrame(columns=["P", "weight"])
for i in range(0, len(names)):
  p += df[["{}_P".format(names[i]), "weight"]]

我希望你能排序数据，但由于列名不相同，所以无法完成操作。

- Peter Klauke

2个回答

1

使用Pandas concat（http://pandas.pydata.org/pandas-docs/stable/merging.html）可能是一种解决方案：

import pandas as pd                                                                           

df = pd.DataFrame.from_dict({'a1_P': [123.123, 342.123],
                             'a2_P': [232.12, 32.23],
                             'weight': [12312.23, 16232.3]})                        

cols = [x for x in df.columns if '_P' in x]                                         

new = pd.concat([df[col] for col in cols])                                          
oldidx = new.index                                                                  
weights = df.loc[new.index, 'weight'].tolist()                                      

new_df = pd.DataFrame.from_dict({'P': new,                                          
                                 'weight': weights})                                
new_df.sort(columns='P', inplace=True)                                           
new_df.reset_index(drop=True, inplace=True)   

print(new_df)

         P    weight                                                                          
0   32.230  16232.30
1  123.123  12312.23
2  232.120  12312.23
3  342.123  16232.30

还有提升性能的空间，但它应该比使用显式循环的解决方案更快。

- chris-sc

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- firelynx · Accepted Answer

pandas.melt函数可以实现你想要的功能：

pd.melt(df, id_vars=['weight'], value_vars=['a1_P', 'a2_P'], value_name='P')
     weight variable         P
0  14733.23     a1_P  33297.81
1  86954.04     a1_P  58895.18
2  14733.23     a2_P  17407.93
3  86954.04     a2_P  43013.57

当然，按P排序很容易，只需在melt语句的末尾添加.sort('P')即可。

pd.melt(df, id_vars=['weight'], value_vars=['a1_P', 'a2_P'], value_name='P').sort('P')
     weight variable         P
2  14733.23     a2_P  17407.93
0  14733.23     a1_P  33297.81
3  86954.04     a2_P  43013.57
1  86954.04     a1_P  58895.18

如果你想要更加动态化，也许可以通过以下方式生成 value_vars：

n_values = 2
value_vars = ["a{}_P".format(i+1) for i in range(0, n_values)]
pd.melt(df, id_vars=['weight'], value_vars=value_vars, value_name='P').sort('P')

为了使索引变为[0, 1, 2, 3 ...]，只需使用 .reset_index(drop=True) ，可以作为一个链接事件或像这样使用：

df = pd.melt(df, id_vars=['weight'], value_vars=value_vars, value_name='P')
df.sort(inplace=True)
df.reset_index(drop=True, inplace=True)

我个人更喜欢原地操作，因为它们更加节省内存。