循环遍历字符串列表，从每个字符串项中删除所有禁用单词

Question

循环遍历字符串列表，从每个字符串项中删除所有禁用单词

3

我有如下列表：

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]

这是一个词汇列表，我希望从列表中的每个字符串项中删除这些词汇：

bannedWord = ['grated', 'zested', 'thinly', 'chopped', ',']

我尝试生成的结果列表如下：

cleaner_list = ["lemons", "cheddar cheese", "carrots"]

到目前为止，我未能实现这一点。我的尝试如下所示：

import re

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
cleaner_list = []
    
def RemoveBannedWords(ing):
    pattern = re.compile("\\b(grated|zested|thinly|chopped)\\W", re.I)
    return pattern.sub("", ing)
    
for ing in dirtylist:
    cleaner_ing = RemoveBannedWords(ing)
    cleaner_list.append(cleaner_ing)
    
print(cleaner_list)

这将返回：

['lemons zested', 'cheddar cheese', 'carrots, chopped']

我也尝试过以下方法：

import re

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
cleaner_list = []

bannedWord = ['grated', 'zested', 'thinly', 'chopped']
re_banned_words = re.compile(r"\b(" + "|".join(bannedWord) + ")\\W", re.I)

def remove_words(ing):
    global re_banned_words
    return re_banned_words.sub("", ing)

for ing in dirtylist:
    cleaner_ing = remove_words(ing)
    cleaner_list.append(cleaner_ing)
  
print(cleaner_list)

这将返回：

['lemons zested', 'cheddar cheese', 'carrots, chopped']

我有点迷失方向，不确定哪里出错了。非常感谢任何帮助。

- JimmyStrings

尝试通过探索set来简化它，会更加清晰...问题是为什么","是被禁止的词？ - Daniel Hao

4个回答

0

def clearList(dirtyList, bannedWords, splitChar):
    clean = []
    for dirty in dirtyList:
        ban = False
        for w in dirty.split():
            if w in bannedWords:
                ban = True

        if ban is False:
            clean.append(dirty)

    return clean

dirtyList 是你要清除的列表

bannedWords 是你不想要的单词

splitChar 是单词之间的字符（" "）

- Sarper Makas

0

以下代码似乎可以工作（一个简单的嵌套循环）

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
bannedWords = ['grated', 'zested', 'thinly', 'chopped', ',']
result = []
for words in dirtylist:
    temp = words
    for bannedWord in bannedWords:
        temp = temp.replace(bannedWord, '')
    result.append(temp.strip())
print(result)

输出

['lemons', 'cheddar cheese', 'carrots']

- balderman

0

我会从bannedWord列表中去掉,，并使用str.strip来去除它：

import re

dirtylist = [
    "lemons zested",
    "grated cheddar cheese",
    "carrots, thinly chopped",
]

bannedWord = ["grated", "zested", "thinly", "chopped"]

pat = re.compile(
    r"\b" + "|".join(re.escape(w) for w in bannedWord) + r"\b", flags=re.I
)

for w in dirtylist:
    print("{:<30} {}".format(w, pat.sub("", w).strip(" ,")))

输出：

lemons zested                  lemons
grated cheddar cheese          cheddar cheese
carrots, thinly chopped        carrots

- Andrej Kesely

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- trincot · Accepted Answer

一些问题：

你正则表达式中的最后一个 \W 要求后面必须有一个字符，这会导致在输入字符串的最后一个单词是禁用的单词时失败。你可以像在正则表达式开头一样再次使用 \b。
由于你想替换逗号，所以需要将其作为选项添加。确保不要将其放在同一捕获组中，否则末尾的 \\b 将要求逗号后跟着一个字母或数字字符。因此，它应该作为选项放在你的正则表达式的最后（或开头）。
你可能想在删除禁用的单词后调用 .strip() 函数以删除任何剩余的空格。

因此：

def RemoveBannedWords(ing):
    pattern = re.compile("\\b(grated|zested|thinly|chopped)\\b|,", re.I)
    return pattern.sub("", ing).strip()