Pandas数据框中任意两列之间的百分比差异

Question

Pandas数据框中任意两列之间的百分比差异

13

我希望能够定义一个函数，用于计算任意两个pandas列之间的百分比差异。假设我的dataframe已经定义如下:

R1  R2    R3    R4   R5    R6
 A   B     1     2    3     4

我希望我的计算被定义为

df['R7'] = df[['R3','R4']].apply( method call to calculate perc diff)

和

df['R8'] = df[['R5','R6']].apply(same method call to calculate perc diff)

我该怎么做？

我已经尝试过以下方法：

df['perc_cnco_error'] = df[['CumNetChargeOffs_x','CumNetChargeOffs_y']].apply(lambda x,y: percCalc(x,y))

def percCalc(x,y):
    if x<1e-9:
        return 0
    else:
        return (y - x)*100/x

并且它给了我错误信息

类型错误：('()接受恰好2个参数（已给1个）'，u'出现在索引CumNetChargeOffs_x处')

- user1124702

除非您指定axis关键字参数为1，否则apply将执行逐行操作。因此，请尝试执行lambda x: percCalc(x['R3'], x['R4'])并查看其是否有效！ - spicypumpkin

通过一个小改变 lambda x: percCalc(x['R3'], x['R4']), axis=1 即可运行。谢谢！ - user1124702

1

哦，糟糕...我把轴弄反了。我的错！ - spicypumpkin

3个回答

3

要计算R3和R4之间的百分比差异，您可以使用以下方法：

df['R7'] = (df.R3 - df.R4) / df.R3 * 100

- Daniil Mashkin

1

这将给你百分数的偏差：

df.apply(lambda row: (row.iloc[0]-row.iloc[1])/row.iloc[0]*100, axis=1)

如果您有超过两列，请尝试：

df[['R3', 'R5']].apply(lambda row: (row.iloc[0]-row.iloc[1])/row.iloc[0]*100, axis=1)

- pdubucq

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- sophocles · Accepted Answer

在最简单的术语中：

def percentage_change(col1,col2):
    return ((col2 - col1) / col1) * 100

你可以将它应用于你的数据框中的任意两列：

df['a'] = percentage_change(df['R3'], df['R4'])    
df['b'] =  percentage_change(df['R6'], df['R5'])

>>> print(df)
 
  R1 R2  R3  R4  R5  R6      a     b
0  A  B   1   2   3   4  100.0 -25.0

使用pandas算术操作函数等效地进行操作。

def percentage_change(col1,col2):
    return col2.sub(col1).div(col1).mul(100)

你还可以利用内置的pandas的pct_change函数，它可以计算传入的所有列的百分比变化，并选择你想要返回的列。

df['R7'] = df[['R3', 'R4']].pct_change(axis=1)['R4']
df['R8'] = df[['R6', 'R5']].pct_change(axis=1)['R5']

>>> print(df)

  R1 R2  R3  R4  R5  R6      a     b   R7    R8
0  A  B   1   2   3   4  100.0 -25.0  1.0 -0.25

设置：

df = pd.DataFrame({'R1':'A','R2':'B',
                   'R3':1,'R4':2,'R5':3,'R6':4},
                  index=[0])