Python中将字典列表中的值映射到字符串的操作

3

我正在研究如下的句子构成:

sentence = "PERSON is ADJECTIVE"
dictionary = {"PERSON": ["Alice", "Bob", "Carol"], "ADJECTIVE": ["cute", "intelligent"]}

我现在需要从字典中获取所有可能的组合来构成这个句子,例如:
Alice is cute
Alice is intelligent
Bob is cute
Bob is intelligent
Carol is cute
Carol is intelligent

以上用例相对简单,下面的代码实现了它。
dictionary = {"PERSON": ["Alice", "Bob", "Carol"], "ADJECTIVE": ["cute", "intelligent"]}

for i in dictionary["PERSON"]:
    for j in dictionary["ADJECTIVE"]:
        print(f"{i} is {j}")

我们能否将这个方法扩展到更长的句子呢?

例如:

sentence = "PERSON is ADJECTIVE and is from COUNTRY" 
dictionary = {"PERSON": ["Alice", "Bob", "Carol"], "ADJECTIVE": ["cute", "intelligent"], "COUNTRY": ["USA", "Japan", "China", "India"]}

这应该再次提供所有可能的组合,例如:
Alice is cute and is from USA
Alice is intelligent and is from USA
.
.
.
.
Carol is intelligent and is from India

我试图使用https://www.pythonpool.com/python-permutations/,但是句子都混在一起了 - 但我们如何使一些单词固定,就像这个例子中的单词"and is from"一样固定。

本质上,如果字典中的任何键等于字符串中的单词,则该单词应替换为字典值列表。

有什么想法将非常有帮助。


如果您需要帮助入门,我建议您查看 itertools.product(*dictionary.values()) 的输出。 - JonSG
1
你可能想把你的句子转换成 str.format() 知道如何处理的内容,这样你就不必自己做 replace 的工作了。 - Samwise
2个回答

2
我会基于两个构建块itertools.productzip来回答您的问题。 itertools.product将允许我们获取字典列表值的各种组合。
使用原始键和上述组合的zip将允许我们创建一个元组列表,我们可以将其与replace一起使用。
import itertools

sentence = "PERSON is ADJECTIVE and is from COUNTRY"
dictionary = {"PERSON": ["Alice", "Bob", "Carol"], "ADJECTIVE": ["cute", "intelligent"], "COUNTRY": ["USA", "Japan", "China", "India"]}

keys = dictionary.keys()
for values in itertools.product(*dictionary.values()):
    new_sentence = sentence
    for tpl in zip(keys, values):
        new_sentence = new_sentence.replace(*tpl)
    print(new_sentence)

如果你能够控制“句子”模板,那么你可以进行以下操作:

sentence = "{PERSON} is {ADJECTIVE} and is from {COUNTRY}"

那么你可以简化为:
sentence = "{PERSON} is {ADJECTIVE} and is from {COUNTRY}"
dictionary = {"PERSON": ["Alice", "Bob", "Carol"], "ADJECTIVE": ["cute", "intelligent"], "COUNTRY": ["USA", "Japan", "China", "India"]}

keys = dictionary.keys()
for values in itertools.product(*dictionary.values()):
    new_sentence = sentence.format(**dict(zip(keys, values)))
    print(new_sentence)

两者都应该给出以下结果:

Alice is cute and is from USA
Alice is cute and is from Japan
...
Carol is intelligent and is from China
Carol is intelligent and is from India

请注意,模板中出现的顺序并不重要,两种解决方案都适用于以下模板:
sentence = "PERSON is from COUNTRY and is ADJECTIVE"

或者在情况2中

sentence = "{PERSON} is from {COUNTRY} and is {ADJECTIVE}"

跟进:

如果字典中包含未在句子模板中的项目会发生什么?目前,使用product()生成句子的方式假定所有键都存在,这样可能会导致重复。这并不理想。

最简单的解决方法是确保字典只包含感兴趣的键...

在第一种情况下,可以这样做。

dictionary = {key: value for key, value in dictionary.items() if key in sentence}

或者在第二种情况下:
dictionary = {key: value for key, value in dictionary.items() if f"{{{key}}}" in sentence}

1
非常感谢,这个完美地运作了,但是如果句子中不包含字典中的所有关键词,则会出现重复。 sentence =“PERSON is ADJECTIVE”dictionary = {“PERSON”:[“Alice”,“Bob”,“Carol”],“ADJECTIVE”:[“cute”,“intelligent”],“COUNTRY”:[“USA”,“Japan”,“China”,“India”]} 有什么想法可以避免这种情况,而不使用集合来去除重复项。我想保持顺序,所以我不需要使用集合来去除重复项。 - Dragon Z
1
那是一个有趣的转折。让我更新答案... - JonSG
非常感谢@jonsg提供的跟进解决方案。这是一个快速、简单而又精妙的解决方案!我尝试了一个句子,其中包含“PERSON是ADJECTIVE和ADJECTIVE”的意图,以获取像“Alice聪明可爱”这样的组合,但由于我们使用了替换,输出结果会给出“Alice聪明可爱”、“Alice聪明聪明”等。这个句子还可以,但是否有可能得到“Alice聪明可爱”这样的组合呢?任何建议都将是极大的帮助,我非常感激。谢谢! - Dragon Z

1

您可以首先将sentence中的字典键替换为{},这样您就可以在循环中轻松格式化字符串。然后,您可以使用itertools.product创建dictionary.values()的笛卡尔积,因此您可以简单地循环遍历它来创建所需的句子。

from itertools import product
sentence = ' '.join([('{}' if w in dictionary else w) for w in sentence.split()])
mapped_sentences_generator = (sentence.format(*tple) for tple in product(*dictionary.values()))
for s in mapped_sentences_generator:
    print(s)

输出:

Alice is cute and is from USA
Alice is cute and is from Japan
Alice is cute and is from China
Alice is cute and is from India
Alice is intelligent and is from USA
Alice is intelligent and is from Japan
Alice is intelligent and is from China
Alice is intelligent and is from India
Bob is cute and is from USA
Bob is cute and is from Japan
Bob is cute and is from China
Bob is cute and is from India
Bob is intelligent and is from USA
Bob is intelligent and is from Japan
Bob is intelligent and is from China
Bob is intelligent and is from India
Carol is cute and is from USA
Carol is cute and is from Japan
Carol is cute and is from China
Carol is cute and is from India
Carol is intelligent and is from USA
Carol is intelligent and is from Japan
Carol is intelligent and is from China
Carol is intelligent and is from India

请注意,此方法适用于Python >3.6,因为它假定字典插入顺序已经维护。对于旧版本的Python,必须使用collections.OrderedDict而不是dict

1
虽然这在特定情况下可以工作,但它有点脆弱,如果句子被更改以使其不与键的顺序对齐,则会失败。例如,sentence =“PERSON is from COUNTRY and is ADJECTIVE”将给出一个Carol is from cute and is USA - JonSG

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接