Python数据框创建一个基于另一列的新列。

Question

Python数据框创建一个基于另一列的新列。

pythonpandas

3

我想在数据框中创建另一列。

数据框如下所示，sub_id是id的一部分，即id是sub_id的“父级”，它包括id本身和包含在id中的一些项目。

id没有名称，但sub_id有相应的名称。

我想检查id与sub_id的名称，并创建id的名称。

df = pd.DataFrame({'id':[1,1,1,2,2],
                    'sub_id':[12,1,13,23,2],
                    'name':['pear','fruit','orange','cat','animal']})
   id  sub_id    name
0   1      12    pear
1   1       1   fruit
2   1      13  orange
3   2      23     cat
4   2       2  animal

我想创建另一列id_name，以得到：

   id  sub_id    name id_name
0   1      12    pear   fruit
1   1       1   fruit   fruit
2   1      13  orange   fruit
3   2      23     cat  animal
4   2       2  animal  animal

我不知道如何高效地实现它，我只考虑了将数据框两次合并，但我认为还有更好的方法。

- Joyce

2个回答

1

你的ID是唯一的吗？

你使用GroupBy.transform来获取每个组的最小ID，并将其映射到现有的id。

df['id_name'] = (df.groupby('id')['sub_id'].transform('min')
                   .map(df.set_index('sub_id')['name'])
                )

输出：

   id  sub_id    name id_name
0   1      12    pear   fruit
1   1       1   fruit   fruit
2   1      13  orange   fruit
3   2      23     cat  animal
4   2       2  animal  animal

- mozway

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jezrael · Accepted Answer

如果在 Series.where 中，用sub_id替换未匹配的id以获得缺失值，那么使用first进行GroupBy.transform操作将起作用，因为它会返回第一个非缺失值。请注意保留HTML标记。

df['id_name'] = (df['name'].where(df['id'].eq(df['sub_id']))
                           .groupby(df['id'])
                           .transform('first'))

您可以使用掩码和映射助手通过Series.map筛选行：

s = df[df['id'].eq(df['sub_id'])].set_index('id')['name']
df['id_name'] = df['id'].map(s)
print (df)
   id  sub_id    name id_name
0   1      12    pear   fruit
1   1       1   fruit   fruit
2   1      13  orange   fruit
3   2      23     cat  animal
4   2       2  animal  animal

详情:

print (df['name'].where(df['id'].eq(df['sub_id'])))
0       NaN
1     fruit
2       NaN
3       NaN
4    animal
Name: name, dtype: object


print (s)
id
1     fruit
2    animal
Name: name, dtype: object