修改排列算法以避免重复打印的策略

Question

修改排列算法以避免重复打印的策略

6

我一直在复习算法练习，目前正在研究一种排列算法，我非常喜欢:

void permute(char* set, int begin, int end) {
    int range = end - begin;

    if (range == 1)
        cout << set << endl;
    else {
        for(int i = 0; i < range; ++i) {
            swap(&set[begin], &set[begin+i]);
            permute(set, begin+1, end);
            swap(&set[begin], &set[begin+i]);
        }
    }
}

我实际上想将其应用于存在许多重复字符的情况下，因此需要能够修改它以防止重复排列的打印。

我该如何检测到我正在生成重复的排列？我知道可以将其存储在哈希表或类似的数据结构中，但这不是最优解-我更喜欢不需要额外存储空间的解决方案。有人能给我一些建议吗？

PS：我不想使用STL排列机制，也不想引用其他“唯一排列算法”的参考资料。如果可能的话，我想了解用于防止重复的机制，以便将其构建到学习中。

- John Humphreys

1

请看这里的解决方案＃2---> http://n1b-algo.blogspot.com/2009/01/string-permutations.html - user405725

6个回答

2

一个简单的解决方案是将重复的字符随机更改为尚未出现的字符，然后在排列后将字符更改回来。仅接受字符顺序正确的排列。

例如，如果您有“a,b,b”

您原本会得到以下内容：

a b b
a b b
b a b
b a b
b b a
b b a

但是，如果我们以a、b、b为起点，并注意到重复的b，那么我们可以将第二个b改成c

现在我们有了a b c

a b c - accept because b is before c. change c back to b to get a b b
a c b - reject because c is before b
b a c - accept as b a b
b c a - accept as b b a
c b a - reject as c comes before b.
c a b - reject as c comes before b.

- Rusty Rob

2

如果交换的是两个相同的字符，你可以添加一个if语句来防止交换代码执行。然后for循环

for(int i = 0; i < range; ++i) {
    if(i==0 || set[begin] != set[begin+i]) {
      swap(&set[begin], &set[begin+i]);
      permute(set, begin+1, end);
      swap(&set[begin], &set[begin+i]);
    }
}

允许i==0的原因是确保递归调用仅发生一次，即使集合中的所有字符都相同。

- user537390

1

这个不行。算法之前可能已经交换了一个字符，这意味着后面它不一定是排序的。因此，在进一步的迭代中，两个相似的字符可能不一定相邻。 - John Humphreys

我不明白为什么这个方法行不通 - 你能给一个反例吗？ - Irit Katriel

1

假设您想为 'ABAB' 生成不同的排列。它应该有6个不同的排列，而上面的方法会生成11个。 - aamir

它只是防止从一个字符生成重复，而不能防止整个列表生成重复。对于情况"AABB"，它肯定会失败。 - Will Yu

1

选项1

一种选择是在堆栈上使用256位存储来存储for循环中尝试过的字符的位掩码，并且仅对新字符进行递归。

选项2

第二个选择是使用评论中建议的方法（http://n1b-algo.blogspot.com/2009/01/string-permutations.html），并将for循环更改为：

else {
    char last=0;
    for(int i = 0; i < range; ++i) {
        if (last==set[begin+i])
            continue;
        last = set[begin+i];
        swap(&set[begin], &set[begin+i]);
        permute(set, begin+1, end);
        swap(&set[begin], &set[begin+i]);
    }
}

然而，要使用这种方法，您还必须在函数入口处对set[begin]、set[begin+1]、...set[end-1]的字符集进行排序。

请注意，每次调用函数时都必须进行排序。（博客文章似乎没有提到这一点，但否则您将为输入字符串"aabbc"生成太多结果。问题在于，在使用swap后，字符串不会保持排序。）

这仍然不是非常高效的。例如，对于包含1个'a'和N个'b'的字符串，这种方法最终将调用N次排序，总复杂度为N^2logN。

选项3

对于包含大量重复内容的长字符串，更有效的方法是同时维护字符串"set"和一个字典，记录每种类型的字符剩余可用数量。for循环将变为遍历字典的键，因为这些是该位置允许的唯一字符。

这将具有与输出字符串数量相等的复杂度，并且只需要非常少量的额外存储来保存字典。

- Peter de Rivaz

1

只需将每个元素插入到一个集合中。它会自动删除重复项。将集合s声明为全局变量。

set <string>s;
void permute(string a, int l, int r) {
    int i;
    if (l == r)
        s.insert(a);
    else
    {
        for (i = l; i <= r; i++)
        {
            swap((a[l]), (a[i]));
            permute(a, l+1, r);
            swap((a[l]), (a[i])); //backtrack
        }
    }
}

最后使用函数打印，保留HTML标记。

void printr()
{
    set <string> ::iterator itr;
    for (itr = s.begin(); itr != s.end(); ++itr)
    {
        cout << '\t' << *itr;
    }
    cout << '\t' << *itr;
}

- Sankarsshan Sundararajan

0

关键是不要两次交换相同的字符。因此，您可以使用unordered_set来记忆已经交换的字符。

void permute(string& input, int begin, vector<string>& output) {
    if (begin == input.size()){
        output.push_back(input);
    }
    else {    
        unordered_set<char> swapped;
        for(int i = begin; i < input.size(); i++) {
            // Do not swap a character that has been swapped
            if(swapped.find(input[i]) == swapped.end()){
                swapped.insert(input[i]);
                swap(input[begin], input[i]);
                permute(input, begin+1, output);
                swap(input[begin], input[i]);
            }
        }
    }
}

你可以亲自检查原始代码，找出重复出现的情况是“与已经交换的字符进行交换”。

例如：输入 = "BAA"

索引 = 0，i = 0，输入 = "BAA"

----> 索引 = 1，i = 1，输入 = "BAA"

----> 索引 = 1，i = 2，输入 = "BAA"（重复）

索引 = 0，i = 1，输入 = "ABA"

----> 索引 = 1，i = 1，输入 = "ABA"

----> 索引 = 1，i = 2，输入 = "AAB"

索引 = 0，i = 2，输入 = "AAB"

----> 索引 = 1，i = 1，输入 = "AAB"（重复）

----> 索引 = 1，i = 2，输入 = "ABA"（重复）

- Moto Lee

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Reinstate Monica · Accepted Answer

没有通用的方法可以防止任意函数生成重复项。当然，你总可以过滤掉重复项，但出于非常好的原因，你不想这样做。因此，你需要一种特殊的方法来仅生成非重复项。

一种方法是按词典顺序递增生成排列。然后你只需比较一个“新”的排列是否与上一个相同，然后跳过它即可。更好的是：在http://en.wikipedia.org/wiki/Permutations#Generation_in_lexicographic_order给出的生成排列的算法中根本不会生成重复项！

然而，这不是你问题的答案，因为它是一个不同的算法（虽然也基于交换）。

因此，让我们更仔细地看一下你的算法。一个关键观察是：

一旦一个字符被交换到位置 begin ，它将在所有嵌套调用 permute 中保持在那里。

我们将把这个与关于排列的以下一般性观察相结合：

如果你对字符串 s 进行排列，但只在相同字符的位置进行排列，那么 s 将保持不变。实际上，所有重复排列在某个字符c的出现顺序不同的位置上都有不同的顺序。

好的，所以我们所要做的就是确保每个字符的出现顺序始终与开头相同。以下为代码，但......我实际上不会C ++，因此我将使用Python，并希望它能过得去。

我们从你原来的算法开始，改写成'伪代码'：

def permute(s, begin, end):
    if end == begin + 1:
        print(s)
    else:
        for i in range(begin, end):
            s[begin], s[i] = s[i], s[begin]
            permute(s, begin + 1, end)
            s[begin], s[i] = s[i], s[begin]

一个辅助函数，使得调用它更加容易：

def permutations_w_duplicates(s):
    permute(list(s), 0, len(s)) # use a list, as in Python strings are not mutable

现在我们通过一些记录来扩展排列函数，记录每个字符被交换到“begin”位置（即已经被“fixed”）的次数，并记住每个字符出现的原始顺序（char_number）。然后，我们尝试将每个要交换到“begin”位置的字符按照原始顺序依次交换，也就是说，一个字符的固定次数定义了该字符下一个可固定的原始出现次数，我称之为“next_fixable”。

def permute2(s, next_fixable, char_number, begin, end):
    if end == begin + 1:
        print(s)
    else:
        for i in range(begin, end):
            if next_fixable[s[i]] == char_number[i]: 
                next_fixable[s[i]] += 1
                char_number[begin], char_number[i] = char_number[i], char_number[begin]

                s[begin], s[i] = s[i], s[begin]
                permute2(s, next_fixable, char_number, begin + 1, end)
                s[begin], s[i] = s[i], s[begin]

                char_number[begin], char_number[i] = char_number[i], char_number[begin]
                next_fixable[s[i]] -= 1

再次，我们使用一个辅助函数：

def permutations_wo_duplicates(s):
    alphabet = set(s)
    next_fixable = dict.fromkeys(alphabet, 0)
    count = dict.fromkeys(alphabet, 0)
    char_number = [0] * len(s)
    for i, c in enumerate(s):
        char_number[i] = count[c]
        count[c] += 1

    permute2(list(s), next_fixable, char_number, 0, len(s))

这就是了！ 差不多了。如果您喜欢，可以在此停下并改用C++重写，但如果您对一些测试数据感兴趣，请继续阅读。

我使用了略微不同的代码进行测试，因为我不想打印所有的排列组合。在Python中，您可以将print替换为yield，这会使函数变成生成器函数，其结果可以用for循环迭代，并且只有在需要时才会计算排列组合。这是我使用的真实代码和测试：

def permute2(s, next_fixable, char_number, begin, end):
    if end == begin + 1:
        yield "".join(s) # join the characters to form a string
    else:
        for i in range(begin, end):
            if next_fixable[s[i]] == char_number[i]:
                next_fixable[s[i]] += 1
                char_number[begin], char_number[i] = char_number[i], char_number[begin]
                s[begin], s[i] = s[i], s[begin]
                for p in permute2(s, next_fixable, char_number, begin + 1, end):
                    yield p
                s[begin], s[i] = s[i], s[begin]
                char_number[begin], char_number[i] = char_number[i], char_number[begin]
                next_fixable[s[i]] -= 1

def permutations_wo_duplicates(s):
    alphabet = set(s)
    next_fixable = dict.fromkeys(alphabet, 0)
    count = dict.fromkeys(alphabet, 0)
    char_number = [0] * len(s)
    for i, c in enumerate(s):
        char_number[i] = count[c]
        count[c] += 1

    for p in permute2(list(s), next_fixable, char_number, 0, len(s)):
        yield p


s = "FOOQUUXFOO"
A = list(permutations_w_duplicates(s))
print("%s has %s permutations (counting duplicates)" % (s, len(A)))
print("permutations of these that are unique: %s" % len(set(A)))
B = list(permutations_wo_duplicates(s))
print("%s has %s unique permutations (directly computed)" % (s, len(B)))

print("The first 10 permutations       :", A[:10])
print("The first 10 unique permutations:", B[:10])

结果是：

FOOQUUXFOO has 3628800 permutations (counting duplicates)
permutations of these that are unique: 37800
FOOQUUXFOO has 37800 unique permutations (directly computed)
The first 10 permutations       : ['FOOQUUXFOO', 'FOOQUUXFOO', 'FOOQUUXOFO', 'FOOQUUXOOF', 'FOOQUUXOOF', 'FOOQUUXOFO', 'FOOQUUFXOO', 'FOOQUUFXOO', 'FOOQUUFOXO', 'FOOQUUFOOX']
The first 10 unique permutations: ['FOOQUUXFOO', 'FOOQUUXOFO', 'FOOQUUXOOF', 'FOOQUUFXOO', 'FOOQUUFOXO', 'FOOQUUFOOX', 'FOOQUUOFXO', 'FOOQUUOFOX', 'FOOQUUOXFO', 'FOOQUUOXOF']

请注意，排列顺序与原始算法相同，只是没有重复。请注意，37800 * 2！* 2！* 4！= 3628800，就像你所期望的一样。