从列中删除重复单词。

3

我有一个如下所示的数据框

df3 = pd.DataFrame({'ID': ['Stay home, T5006, T5006, Stay home', 'Go for walk, T5007, T5007, Go for walk'],
                    'Name': ['Stay home, Go for walk,  Stay home', 'Go outside, Go outside, Go outside']
                    })


    ID                                      Name
0   Stay home, T5006, T5006, Stay home      Stay home, Go for walk, Stay home
1   Go for walk, T5007, T5007, Go for walk  Go outside, Go outside, Go outside

我想从ID列中删除重复项。预期结果:
    ID                  Name
0   Stay home,T5006     Stay home,  Go for walk, Stay home
1   Go for walk,T5007   Go outside, Go outside,  Go outside

有什么想法吗?

1个回答

2

使用dict.fromkey技巧来删除拆分值中的重复项,然后在lambda函数中使用,进行连接:

df3['ID'] = df3['ID'].apply(lambda x: ', '.join(dict.fromkeys(x.split(', '))))

或者使用列表推导式:

df3['ID'] = [', '.join(dict.fromkeys(x.split(', '))) for x in df3['ID']]

print (df3)
                   ID                                Name
0    Stay home, T5006  Stay home, Go for walk,  Stay home
1  Go for walk, T5007  Go outside, Go outside, Go outside

如果顺序不重要,可以使用set
df3['ID'] = df3['ID'].apply(lambda x: ', '.join(set(x.split(', '))))
df3['ID'] = [', '.join(set(x.split(', '))) for x in df3['ID']]
print (df3)
                   ID                                Name
0    Stay home, T5006  Stay home, Go for walk,  Stay home
1  T5007, Go for walk  Go outside, Go outside, Go outside

请问能否详细说明一下 dict.fromkeys 的作用是什么? - Karthik S
1
@KarthikS - 当然,查看这个 - jezrael
1
谢谢,基本上它将列表或集合转换为唯一的键值对, 如果没有给定值,则默认为None。不知道''.join会连接字典的键。谢谢! - Karthik S

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接