根据其他列具有多个值映射到相同新列值，添加一个基于其他列的df列。

Question

根据其他列具有多个值映射到相同新列值，添加一个基于其他列的df列。

5

我有一个像这样的数据框：

df1 = pd.DataFrame({'col1' : ['cat', 'cat', 'dog', 'green', 'blue']})

我想要一个新的列，它可以给出类别，像这样：

dfoutput = pd.DataFrame({'col1' : ['cat', 'cat', 'dog', 'green', 'blue'],
                         'col2' : ['animal', 'animal', 'animal', 'color', 'color']})

我知道我可以使用.loc来低效地做到这一点：

df1.loc[df1['col1'] == 'cat','col2'] = 'animal'
df1.loc[df1['col1'] == 'dog','col2'] = 'animal'

我该如何将cat和dog组合成animal？这种方式不起作用：

df1.loc[df1['col1'] == 'cat' | df1['col1'] == 'dog','col2'] = 'animal'

- Liquidity

3个回答

3

由于多个项目可能属于单个类别，建议您从将类别映射到项目的字典开始：

cat_item = {'animal': ['cat', 'dog'], 'color': ['green', 'blue']}

您可能会发现这样更容易维护。然后使用字典推导反转您的字典，接着使用pd.Series.map函数：

item_cat = {w: k for k, v in cat_item.items() for w in v}

df1['col2'] = df1['col1'].map(item_cat)

print(df1)

    col1    col2
0    cat  animal
1    cat  animal
2    dog  animal
3  green   color
4   blue   color

你也可以使用 pd.Series.replace，但这通常会效率更低。

- jpp

0

你也可以尝试像这样使用np.select：

options = [(df1.col1.str.contains('cat|dog')), 
           (df1.col1.str.contains('green|blue'))]

settings = ['animal', 'color']

df1['setting'] = np.select(options,settings)

我发现即使是非常大的数据框，这种方法也可以运行得非常快。

- Mara

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- BENY · Accepted Answer

建立你的 dict 然后使用 map。

d={'dog':'ani','cat':'ani','green':'color','blue':'color'}
df1['col2']=df1.col1.map(d)
df1
    col1   col2
0    cat    ani
1    cat    ani
2    dog    ani
3  green  color
4   blue  color