我希望能够删除数据框(df)列中所有不在指定列表中的子字符串。例如:
mylist = {good, like, bad, hated, terrible, liked}
Current: Desired:
index content index content
0 a very good idea, I like it 0 good like
1 was the bad thing to do 1 bad
2 I hated it, it was terrible 2 hated terrible
... ...
k Why do you think she liked it k liked
我已经定义了一个函数,可以保留不在列表中的所有单词,但是我不知道如何反转这个函数以实现我想要的结果:
pat = r'\b(?:{})\b'.format('|'.join(mylist))
df['column1'] = df['column1'].str.contains(pat, '')
任何帮助将不胜感激。