检测两个字符串是否相同但顺序不同

Question

检测两个字符串是否相同但顺序不同

5

我的目标是检测两个字符串是否相同但顺序不同。

Example
"hello world my name is foobar" is the same as "my name is foobar world hello"

我已经尝试将两个字符串分割为列表，并在循环中进行比较。

text = "hello world my name is foobar"
textSplit = text.split()

pattern = "foobar is my name world hello"
pattern = pattern.split()

count = 0
for substring in pattern:
    if substring in textSplit:
        count += 1

if (count == len(pattern)):
    print ("same string detected")

它返回了我想要的结果，但这是否是正确和高效的方法？也许还有其他方法。如果有关于该主题的期刊建议，那将非常不错。

编辑1：重复的单词很重要。

text = "fish the fish the fish fish fish"
pattern = "the fish"

它必须返回false。

- nfl-x

3

重复的单词怎么处理？"the fish" 和 "fish the fish the fish fish fish" 是一样的吗？ - Jon Clements

3

sorted(text) == sorted(pattern) 这样行吗？虽然不是很高效，但实现起来相当容易。 - Ozgur Vatansever

如果重复项不重要，则 len(set(text).difference(pattern)) == 0。 - Chen A.

@OzgurVatansever：sorted有什么不高效的地方吗？ O(n.log(n))几乎总是足够好的，而且接近于O(n)。你提出的问题是'abc'和'cba'被认为是相等的。 - Eric Duminil

JonClements我错过了那个情况。谢谢。很快会更新代码和问题。 OzgurVatansever感谢建议。 Vinny Dups很重要。 - nfl-x

显示剩余3条评论

4个回答

3

我认为你的实现会忽略文本中的多余单词（也许这是有意为之的？）。

例如，如果text = "a b"且pattern = "a"，那么你的输出是"same string detected"

我会采用下面的方法：在不考虑顺序的情况下进行比较，让我想到了集合。因此，使用集合的解决方案如下：

same = set(text.split()) == set(pattern.split())

抱歉，我无法理解您需要翻译的具体内容。请提供要翻译的文本，我将尽力为您提供帮助。

from collections import Counter
split_text = text.split()
split_pattern = pattern.split()
same = (Counter(split_text) == Counter(split_pattern))

- Chris Charles

你的解决方案认为"hello"和"hello hello"是相等的。不清楚这是否是期望的行为。 - Eric Duminil

0

你可以从每个字符串中创建一个列表，并计算它们之间的交集；如果交集与第一个字符串的长度相同，那么它们就是相同的。

text = "hello world my name is foobar"
pattern = "foobar is my name world hello"
text = text.split(" ")
pattern = pattern.split(" ")
result = True
if len(text) != len(pattern):
    result = false
else:
    l = list(set(text) & set(pattern))
    if len(l)!=len(text):
        result = False
if result == True:
    print ("same string detected")
else:
    print ("Not the same string")

- Mehdi Ben Hamida

你需要注意你的长度检查... if len(l) != len(text) - 因为 l 已经移除了重复项，而 text 中有重复的单词 - 这个检查就不可靠了... - Jon Clements

set(text)和set(pattern)可以去除重复的元素 - Mehdi Ben Hamida

0

你也可以从你想要比较的字符串中创建一个新的字符串 str12。然后将 str12 的长度与没有重复项的 str12 的两倍进行比较。

str1 = "hello world my name is foobar"
str2 = "my name is foobar world hello"


str12 = (str1 + " " +str2).split(" ")

str12_remove_duplicate = list(set(str12))

if len(str12) == 2 * len(str12_remove_duplicate):
    print("String '%s' and '%s' are SAME but different order" % (str1, str2))
else: 
    print("String '%s' and '%s' are NOT SAME" % (str1, str2))

- sslloo

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Eric Duminil · Accepted Answer

如果您想检查两个句子是否具有相同的单词（出现次数相同），您可以将句子分成单词并对其进行排序：

>>> sorted("hello world my name is foobar".split())
['foobar', 'hello', 'is', 'my', 'name', 'world']
>>> sorted("my name is foobar world hello".split())
['foobar', 'hello', 'is', 'my', 'name', 'world']

您可以在一个函数中定义检查：

def have_same_words(sentence1, sentence2):
    return sorted(sentence1.split()) == sorted(sentence2.split())

print(have_same_words("hello world my name is foobar", "my name is foobar world hello"))
# True

print(have_same_words("hello world my name is foobar", "my name is foobar world hello"))
# True

print(have_same_words("hello", "hello hello"))
# False

print(have_same_words("hello", "holle"))
# False

如果大小写不重要，您可以比较小写的句子：

def have_same_words(sentence1, sentence2):
    return sorted(sentence1.lower().split()) == sorted(sentence2.lower().split())

print(have_same_words("Hello world", "World hello"))
# True

注意：您也可以使用collections.Counter代替sorted。复杂度将为O(n)而不是O(n.log(n))，但这并没有太大的区别。 import collections可能比字符串排序需要更长时间：

from collections import Counter

def have_same_words(sentence1, sentence2):
    return Counter(sentence1.lower().split()) == Counter(sentence2.lower().split())

print(have_same_words("Hello world", "World hello"))
# True

print(have_same_words("hello world my name is foobar", "my name is foobar world hello"))
# True

print(have_same_words("hello", "hello hello"))
# False

print(have_same_words("hello", "holle"))
# False