在Python中找出所有字符与其他单词匹配的单词

3

像umbellar = umbrella这样的单词是相等的。

输入 = ["umbellar","goa","umbrella","ago","aery","alem","ayre","gnu","eyra","egma","game","leam","amel","year","meal","yare","gun","alme","ung","male","lame","mela","mage"]

因此输出应为:

输出=[ ["umbellar","umbrella"], ["ago","goa"], ["aery","ayre","eyra","yare","year"], ["alem","alme","amel","lame","leam","male","meal","mela"], ["gnu","gun","ung"] ["egma","game","mage"], ]


2
这是作业吗?如果是,请标记为作业。 - Manuel Salvadores
1
假设相等的单词必须具有相同的长度,然后对列表中的每个字符串进行排序并检查匹配项。 - PrettyPrincessKitty FS
5个回答

7

from itertools import groupby

def group_words(word_list):
    sorted_words = sorted(word_list, key=sorted)
    grouped_words = groupby(sorted_words, sorted)
    for key, words in grouped_words:
        group = list(words)
        if len(group) > 1:
            yield group

例子:

>>> group_words(["umbellar","goa","umbrella","ago","aery","alem","ayre","gnu","eyra","egma","game","leam","amel","year","meal","yare","gun","alme","ung","male","lame","mela","mage" ])
<generator object group_words at 0x0297B5F8>
>>> list(_)
[['umbellar', 'umbrella'], ['egma', 'game', 'mage'], ['alem', 'leam', 'amel', 'meal', 'alme', 'male', 'lame', 'mela'], ['aery', 'ayre', 'eyra', 'year', 'yare'], ['goa', 'ago'], ['gnu', 'gun', 'ung']]

1
[list(g) for k,g in itertools.groupby(sorted(INPUT,key=sorted),sorted)] - Kabie
@Kabie:我故意使用临时变量来帮助可读性。 :)此外,如果我们按照示例进行,没有字谜的单词根本不应该返回。 - shang

4

它们不是相等的单词,而是由相同字母组成的单词。

可以通过按字符排序找到这些单词的变位词:

sorted('umbellar') == sorted('umbrella')

1

collections.defaultdict 很方便:

from collections import defaultdict

input = ["umbellar","goa","umbrella","ago","aery","alem","ayre","gnu",
"eyra","egma","game","leam","amel","year","meal","yare","gun",
"alme","ung","male","lame","mela","mage" ]

D = defaultdict(list)
for i in input:
    key = ''.join(sorted(input))
    D[key].append(i)

output = D.values()

输出结果为[['umbellar', 'umbrella'], ['goa', 'ago'], ['gnu', 'gun', 'ung'], ['alem', 'leam', 'amel', 'meal', 'alme', 'male', 'lame', 'mela'], ['egma', 'game', 'mage'], ['aery', 'ayre', 'eyra', 'year', 'yare']]


0
正如其他人所指出的,您正在寻找单词列表中所有变位词组。这里有一个可能的解决方案。该算法查找候选项并选择一个(第一个元素)作为规范词,将其余部分删除为可能的单词,因为变位词是可转移的,一旦您发现一个单词属于变位词组,您就不需要再次计算它。
input = ["umbellar","goa","umbrella","ago","aery","alem","ayre","gnu",
"eyra","egma","game","leam","amel","year","meal","yare","gun",
"alme","ung","male","lame","mela","mage" ]
res = dict()
for word in input:
    res[word]=[word]
for word in input:
    #the len test is just to avoid sorting and comparing words of different len
    candidates = filter(lambda x: len(x) == len(word) and\
                                  sorted(x) == sorted(word),res.keys())
    if len(candidates):
        canonical = candidates[0]
        for c in candidates[1:]:
            #we delete all candidates expect the canonical/
            del res[c]
            #we add the others to the canonical member
            res[canonical].append(c)
print res.values()

这个算法输出...

[['year', 'ayre', 'aery', 'yare', 'eyra'], ['umbellar', 'umbrella'],
 ['lame', 'leam', 'mela', 'amel', 'alme', 'alem', 'male', 'meal'],
 ['goa', 'ago'], ['game', 'mage', 'egma'], ['gnu', 'gun', 'ung']]

0

Shang的答案是正确的......但我被挑战要在不使用'groupby()'的情况下完成同样的事情...... 这里是代码...... 添加打印语句将有助于您调试代码和运行时输出....

def group_words(word_list):
    global new_list
    list1 = [] 
    _list0 = []
    _list1 = []
    new_list = []
    for elm in word_list:
        list_elm = list(elm)
        list1.append(list(list_elm))
    for ee in list1:
        ee = sorted(ee)
        ee = ''.join(ee)
        _list1.append(ee)   
    _list1 = list(set(_list1))
    for _e1 in _list1:
        for e0 in word_list:
            if  len(e0) == len(_e1):
                list_e0 = ''.join(sorted(e0))
                if _e1 == list_e0:
                    _list0.append(e0)
                    _list0 = list(_list0)
        new_list.append(_list0)
        _list0 = []
    return new_list

输出结果为

[['umbellar', 'umbrella'], ['goa', 'ago'], ['gnu', 'gun', 'ung'], ['alem', 'leam', 'amel', 'meal', 'alme', 'male', 'lame', 'mela'], ['egma', 'game', 'mage'], ['aery', 'ayre', 'eyra', 'year', 'yare']]

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接