从Python列表中删除单词

Question

从Python列表中删除单词

3

我有一个名为'abc'的列表（字符串），我正在尝试从列表'stop'中删除一些单词，并从列表'abc'中删除所有数字。

abc=[ 'issues in performance 421',
 'how are you doing',
 'hey my name is abc, 143 what is your name',
 'attention pleased',
 'compliance installed 234']
stop=['attention', 'installed']

我正在使用列表推导式来删除它，但是下面的代码无法删除那个单词。

new_word=[word for word in abc if word not in stop ]

结果：（注意词汇仍然存在。）

['issues in performance',
 'how are you doing',
 'hey my name is abc, what is your name',
 'attention pleased',
 'compliance installed']

期望输出：

 ['issues in performance',
     'how are you doing',
     'hey my name is abc, what is your name',
     'pleased',
     'compliance']

谢谢

- user15051990

1

在你的代码中，word 不是一个单词，而是一个句子。 - Ankit Jaiswal

你只想删除这个词吗？还是想要删除整个列表元素？ - user3483203

我想从句子列表中仅删除单词“stop”。 - user15051990

你的解决方案存在问题，因为你在停用词列表中寻找整个句子。 - Varun Maurya

5个回答

1

只需要使用set就可以解决这个问题。因为每个项目可能有多个单词，所以不能使用in，应该使用set和&获取公共单词。如果存在公共单词且与您的stop集合相同，则set将返回True。因为只关心剩余部分，所以我们可以在此处使用if not。

new_word=[word for word in abc if  not set(word.split(' ')) & set(stop)]

更新

如果您还想删除所有包含数字的项目，您只需要使用以下简单方法：

new_word=[word for word in abc if  not (set(word.split(' ')) & set(stop) or any([i.strip().isdigit() for i in word.split(' ')]))]

- Frank AK

谢谢Frank！我想要删除所有字符串中出现的数字。你能帮我做一下吗？ - user15051990

@user15051990，你只需要在if条件语句的末尾添加“or any([i for i in word.split(“ “) if i.isdigit()])”。因为我身边没有电脑，所以无法进行测试。如果你遇到任何错误，请告诉我。 - Frank AK

@user15051990 我已经更新了答案，你可以测试一下看看是否符合你的期望。 - Frank AK

但是集合不是无序的吗？ - AdminBenni

1

这里有一个解决方案，使用简单的正则表达式和 re.sub 方法。 此解决方案也可以删除数字。

import re

abc=[ 'issues in performance 421',
 'how are you doing',
 'hey my name is abc, 143 what is your name',
 'attention pleased',
 'compliance installed 234']
stop=['attention\s+', 'installed\s+', '[0-9]']

[(lambda x: re.sub(r'|'.join(stop), '', x))(x) for x in abc]


'Output':
['issues in performance ',
'how are you doing',
 'hey my name is abc,  what is your name',
 'pleased',
 'compliance ']

- Ekaba Bisong

1

抱歉，我在将代码从编辑器转移到StackOverflow时出现了错误。感谢您发现了这个问题。我已经相应地更新了答案。谢谢。 - Ekaba Bisong

作为奖励，您可以添加\s+以替换为\s以避免双空格。 - Zoran

如果坚持“单词”，可以添加\bword\b以确保不切割子字符串的一部分。 - Zoran

1

list1 = []
for word in abc:
    word1 = ''
    for remove_word in stop:
        word1 = remove_word
        word1 = word.replace(word1, '')
    list1.append(word1)

- Kiran

1

这至少是我会做的方式：

abc=[ 'issues in performance 421',
    'how are you doing',
    'hey my name is abc, 143 what is your name',
    'attention pleased',
    'compliance installed 234'
]
stop=['attention', 'installed']
for x, elem in enumerate(abc):
    abc[x] = " ".join(filter(lambda x: x not in stop and not x.isdigit(), elem.split()))
print(abc)

result:

['issues in performance',
    'how are you doing',
    'hey my name is abc, what is your name',
    'pleased',
    'compliance']

希望它能以任何方式帮助到你 :)

- AdminBenni

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- blhsing · Accepted Answer

你需要将每个短语拆分成单词，并在过滤掉“stop”中的单词后重新组合成短语。

[' '.join(w for w in p.split() if w not in stop) for p in abc]

这将输出：

['issues in performance', 'how are you doing', 'hey my name is abc, what is your name', 'pleased', 'compliance installed']