Python从列表中返回唯一单词（不区分大小写）

Question

Python从列表中返回唯一单词（不区分大小写）

8

我可以帮助您返回列表中按顺序排列的不重复单词（忽略大小写）。例如：

def case_insensitive_unique_list(["We", "are", "one", "we", "are", "the", "world", "we", "are", "THE", "UNIVERSE"])

将返回：

["我们", "是", "世界", "宇宙", "的", "一部分"]

到目前为止，这就是我得到的：

def case_insensitive_unique_list(list_string):

uppercase = ["A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"]
lowercase = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]

temp_unique_list = []

for i in list_string:
    if i not in list_string:
        temp_unique_list.append(i)

我在比较temp_unique_list中的每个单词时遇到了困难，无法判断该单词是否重复。例如："to"和"To"（我假设range函数会有用）。为了让它返回原始列表中出现的第一个单词，需要使用函数。

您可以使用for循环来完成此操作，以下是示例代码：

- user927584

7个回答

3

您可以使用set()和列表推导式：

>>> seen = set()
>>> lst = ["We", "are", "one", "we", "are", "the", "world", "we", "are", "THE", "UNIVERSE"]
>>> [x for x in lst if x.lower() not in seen and not seen.add(x.lower())]
['We', 'are', 'one', 'the', 'world', 'UNIVERSE']

- Mitul Shah

我该如何使用for循环来实现这个？ - Tim

@zmo 在 LC 中具有副作用并不优雅。 - thefourtheye

@zmo 即使在列表推导式中使用了 for 关键字，我相信这不是 OP 想要的。 - Tim

我不同意我的同事，我认为这是一个很好的维护顺序的习惯用语。我经常使用它。（无论列表推导式是否在OP想要的范围内，我都不知道也不关心） - roippi

1

你可以这样做：

l = ["We", "are", "one", "we", "are", "the", "world", "we", "are", "THE", "UNIVERSE"]

a = []

for i in l:
    if i.lower() not in [j.lower() for j in a]:
        a.append(i)

>>> print a
['We', 'are', 'one', 'the', 'world', 'UNIVERSE']

- sshashank124

2

这是做这项任务的一种极其低效的方式。 - roippi

1

l=["We", "are", "one", "we", "are", "the", "world", "we", "are", "THE", "UNIVERSE"]
so=[]
for w in l:
    if w.lower() not in so:
        so.append(w.lower())

In [14]: so
Out[14]: ['we', 'are', 'one', 'the', 'world', 'universe']

- Padraic Cunningham

1

你可以使用 set 来确保唯一性。如果您尝试将重复项添加到集合中，它将仅在其中已经存在时将其丢弃。

您还应该使用内置的 lower() 函数来处理大小写不敏感性。

uniques = set()
for word in words:
    set.add(word.lower()) #lower it first and then add it

如果这是一个作业任务，并且禁止使用set，那么你可以轻松地将其改为仅使用列表，只需循环并添加条件即可：

uniques = list()
if word.lower() not in uniques:
    #etc

- Matt O

1

你可以像这样使用 collections.OrderedDict。

from collections import OrderedDict
def case_insensitive_unique_list(data):
    d = OrderedDict()
    for word in data:
        d.setdefault(word.lower(), word)
    return d.values()

输出：

['We', 'are', 'one', 'the', 'world', 'UNIVERSE']

- Kei Minagawa

啊！我也打算发同样的回复！ - John Smith Optional

0

好的，我删除了之前的回答，因为我误读了楼主的帖子。非常抱歉。

作为一个借口，为了好玩和以不同的方式完成它，这里有另一种解决方案，虽然它既不是最有效的，也不是最好的：

>>> from functools import reduce
>>> for it in reduce(lambda l,it: l if it in set({i.lower() for i in l}) else l+[it], lst, []):
...     print(it, end=", ")

- zmo

1

将返回：["我们", "是", "一个", "世界", "宇宙"] - Padraic Cunningham

1

“而且由于他没有说他想要保留令牌的顺序：” “我需要帮助从列表中按顺序返回唯一单词（不区分大小写）。” - roippi

好的，我误解了 :-) 是我的错，不过我想到了另一个解决方案，所以我正在编辑。 - zmo

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- thefourtheye · Accepted Answer

您可以借助 for 循环和 set 数据结构来实现，如下所示。

def case_insensitive_unique_list(data):
    seen, result = set(), []
    for item in data:
        if item.lower() not in seen:
            seen.add(item.lower())
            result.append(item)
    return result

输出

['We', 'are', 'one', 'the', 'world', 'UNIVERSE']