如何向多级索引数据框中添加列？

Question

如何向多级索引数据框中添加列？

pythonpython-2.7python-3.xpandasdataframe

4

我有一个多级别数据框，长这个样子：

       ACA FP Equity            UCG IM Equity            
          LAST PRICE     VOLUME    LAST PRICE      VOLUME
date                                                         
2010-01-04        12.825  5879617.0       15.0292  10844639.0
2010-01-05        13.020  6928587.0       14.8092  16456228.0
2010-01-06        13.250  5290631.0       14.6834  10446450.0
2010-01-07        13.255  5328586.0       15.0292  31900341.0
2010-01-08        13.470  7160295.0       15.1707  40750768.0

如果我想为每个股票添加第三列，应该怎么写语法？例如：

df['ACA FP Equity']['PriceVolume'] = df['ACA FP Equity']['LAST PRICE']*3

但我希望能够对每个股票进行操作，而不需要手动添加每一个。

提前感谢。

- dsugasa

2个回答

2

如果您需要将所有的 LAST PRICE 列乘以 3，请使用 slicers 选择它们，并重命名列名：

idx = pd.IndexSlice
df1 = df.loc[:, idx[:, 'LAST PRICE']].rename(columns={'LAST PRICE':'PriceVolume'}) * 3
print (df1)
           ACA FP Equity UCG IM Equity
             PriceVolume   PriceVolume
2010-01-04        38.475       45.0876
2010-01-05        39.060       44.4276
2010-01-06        39.750       44.0502
2010-01-07        39.765       45.0876
2010-01-08        40.410       45.5121

然后您需要使用concat将输出连接起来：

print (pd.concat([df,df1], axis=1))
           ACA FP Equity            UCG IM Equity             ACA FP Equity  \
              LAST PRICE     VOLUME    LAST PRICE      VOLUME   PriceVolume   
2010-01-04        12.825  5879617.0       15.0292  10844639.0        38.475   
2010-01-05        13.020  6928587.0       14.8092  16456228.0        39.060   
2010-01-06        13.250  5290631.0       14.6834  10446450.0        39.750   
2010-01-07        13.255  5328586.0       15.0292  31900341.0        39.765   
2010-01-08        13.470  7160295.0       15.1707  40750768.0        40.410   

           UCG IM Equity  
             PriceVolume  
2010-01-04       45.0876  
2010-01-05       44.4276  
2010-01-06       44.0502  
2010-01-07       45.0876  
2010-01-08       45.5121

另一种不使用concat的解决方案是从selected_df的列创建元组，然后分配输出：

idx = pd.IndexSlice
selected_df = df.loc[:, idx[:, 'LAST PRICE']]

new_cols = [(x, 'PriceVolume') for x in selected_df.columns.levels[0]]
print (new_cols)
[('ACA FP Equity', 'PriceVolume'), ('UCG IM Equity', 'PriceVolume')]

df[new_cols] = selected_df * 3
print(df)
           ACA FP Equity            UCG IM Equity             ACA FP Equity  \
              LAST PRICE     VOLUME    LAST PRICE      VOLUME   PriceVolume   
2010-01-04        12.825  5879617.0       15.0292  10844639.0        38.475   
2010-01-05        13.020  6928587.0       14.8092  16456228.0        39.060   
2010-01-06        13.250  5290631.0       14.6834  10446450.0        39.750   
2010-01-07        13.255  5328586.0       15.0292  31900341.0        39.765   
2010-01-08        13.470  7160295.0       15.1707  40750768.0        40.410   

           UCG IM Equity  
             PriceVolume  
2010-01-04       45.0876  
2010-01-05       44.4276  
2010-01-06       44.0502  
2010-01-07       45.0876  
2010-01-08       45.5121

- jezrael

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Floyd · Accepted Answer

我能想到的最优雅的方式是：

df['ACA FP Equity']['PriceVolume'] = pd.Series(df['ACA FP Equity']['LAST PRICE'].apply(lambda x: x*3))

“apply”语句允许您执行给定的函数，本例中是一个“lambda表达式”，它将指定数据帧列的每个值乘以三。运行应用程序语句将返回一个Pandas “Series”，然后可以将其作为列添加到数据帧中。

以下是一个简单的示例，演示如何在简单数据帧中使用此方法：

import pandas as pd

df = pd.DataFrame(data={'a': [1, 2, 3], 'b': [4, 5, 6]})
print(df)

# Output:
# /  a  b
# 0  1  4
# 1  2  5
# 2  3  6


# Add column 'c'
df['c'] = pd.Series(df['b'].apply(lambda x: x*3))
print(df)

# Output:
# /  a  b  c
# 0  1  4  12
# 1  2  5  15
# 2  3  6  18