组内连续行之间的数据帧差异及创建说明其差异的字符串

3

数据框:

col1  col_entity col2
a        a1       50
b        b1       40
a        a2       40
a        a3       30
b        b2       20
a        a4       20
b        b3       30
b        b4       50

我需要根据col1对它们进行分组,并根据每个分组的col2从高到低排序,然后找出连续行之间的差异,并为字符串语句创建不同组的列。 数据帧:

col1  col_entity col2   diff   col_statement
a        a1       50     10     difference between a1 and a2 is 10
b        a2       40     10     difference between a2 and a3 is 10
a        a3       30     10     difference between a3 and a4 is 10
a        a4       20     nan    **will drop this row**
b        b1       40     10     difference between b1 and b4 is 10
a        b4       50     10     difference between b4 and b3 is 10
b        b3       30     10     difference between b3 and b2 is 10
b        b2       20     nan    **will drop this row**

请帮我解决这个问题,提前感谢您


我回答了你的问题吗?如果我已经回答了,请在我的解决方案旁边点击复选标记。如果有帮助,请点赞。谢谢! - David Erickson
1个回答

0
你可以使用几个np.where语句:
  1. 使用 diff().abs().shift() 来获取一行和下面一行之间的绝对差。
  2. 如果提取的字母字符在一行和下一行之间不匹配,则返回NaN
  3. col_statement列中,基于其他列的条件构建一个字符串,有条件地使用np.where()替换NaN值。

df['diff'] = np.where(df['col1'].str.extract('([a-z])') == df['col1'].shift(-1).str.extract('([a-z])'),
                      df['col_entity col2'].diff().abs().shift(-1), np.nan)
df['col_statement'] = np.where(df['diff'].isnull(),
                               '**will drop this row**',
                              'difference between' + ' ' + df['col1'] + ' and '
                                   + df['col1'].shift(-1) + ' is ' + df['diff'].astype(str))
df
Out[1]: 
  col1  col_entity col2  diff                         col_statement
a   a1               50  10.0  difference between a1 and a2 is 10.0
b   a2               40  10.0  difference between a2 and a3 is 10.0
a   a3               30  10.0  difference between a3 and a4 is 10.0
a   a4               20   NaN                **will drop this row**
b   b1               40  10.0  difference between b1 and b4 is 10.0
a   b4               50  10.0  difference between b4 and b3 is 10.0
b   b3               30  10.0  difference between b3 and b2 is 10.0
b   b2               20   NaN                **will drop this row**

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接