根据分组数据对列进行标记。

4
我正在尝试创建一个列,其中包含每个id的唯一值(与每个id关联的行数众多),如果id的任何行都带有answered标签,则应该将与该id相关的所有行标记为answered。如果与id关联的所有行都没有回答标签,则所有行都应标记为未回答(这是当前的情况)。
这是我编写的代码:
import numpy as np
conds = [file.data__answered_at.isna(),file.data__answered_at.notna()]
choices = ["not answered","answered"]
file['call_status'] = np.select(conds,choices,default=np.nan)

 data__id   call_status       rank
  1            answered        1
  1          not_answered      2
  1            answered        3
  2          not_answered      1
  2             answered       2
  3          not_answered      1
  4            answered        1
  4          not_answered      2
  5          not_answered      1
  5          not_answered      2

在这种情况下,期望的结果将是:
   data__id   call_status       rank
  1            answered        1
  1            answered        2
  1            answered        3
  2            answered        1
  2            answered        2
  3          not_answered      1
  4            answered        1
  4            answered        2
  5          not_answered      1
  5          not_answered      2
2个回答

5

使用GroupBy.transformGroupBy.any,测试每个组中是否至少有一个answered,并通过DataFrame.loc设置值:

mask = df['call_status'].eq('answered').groupby(df['data__id']).transform('any')

或者通过另一列筛选出所有data__id并使用Series.isin测试成员身份:

mask = df['data__id'].isin(df.loc[df['call_status'].eq('answered'), 'data__id'].unique())

df.loc[mask, 'call_status'] = 'answered'
print (df)
   data__id   call_status  rank
0         1      answered     1
1         1      answered     2
2         1      answered     3
3         2      answered     1
4         2      answered     2
5         3  not_answered     1
6         4      answered     1
7         4      answered     2
8         5  not_answered     1
9         5  not_answered     2

3
我们可以在这里使用groupby,并检查是否有any行等于answered
然后,我们使用np.where有条件地填充answerednot_answered
m = file.groupby('data__id')['call_status'].transform(lambda x: x.eq('answered').any())

file['call_status'] = np.where(m, 'answered', 'not_answered')

输出

  data__id   call_status  rank
0         1      answered     1
1         1      answered     2
2         1      answered     3
3         2      answered     1
4         2      answered     2
5         3  not_answered     1
6         4      answered     1
7         4      answered     2
8         5  not_answered     1
9         5  not_answered     2

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接