找出一个字符串的所有唯一排列，避免生成重复项。

Question

找出一个字符串的所有唯一排列，避免生成重复项。

calgorithmcombinatorics

21

查找字符串的所有排列可以使用著名的Steinhaus-Johnson-Trotter算法。但如果字符串包含重复字符，例如
AABB，
那么可能的唯一组合将是4！/（2! * 2!）= 6

一种实现方法是我们可以将其存储在数组中，然后删除重复项。

是否有更简单的方法修改Johnson算法，以便我们永远不会生成重复的排列（以最有效的方式）？

- titan

置换的定义是什么？BA 是 AABB 的有效置换吗？ - ElKamina

2

不，BA不是AABB的有效排列。 - titan

排列是将字符串中的字符洗牌形成的一种序列。对于长度为n且字符唯一的字符串，我们有n!种可能的唯一排列。 - titan

你可以修改约翰逊算法，将每个字母的出现放在一个步骤中。 - asaelr

如果您无法避免生成重复项，那么您可以通过将排列存储在自平衡BST或类似的排序结构中，在生成它们时删除重复项来受益。 - Brian McFarland

澄清一下（或可能会让问题更加混乱）：如果n是包括重复排列的数量，事后删除重复项将导致O(n log n)，假设使用堆排序、快速排序等。如果在插入时进行删除，则最终结果会更接近O(n log m)，其中m是唯一排列的数量（小于n）。因此，随着m趋近于1（即更多像“AAAA”这样的重复项），您将从提前排序中受益更多，而不是在非常长的字符串和少量重复项中。 - Brian McFarland

5个回答

5

首先将字符串转换为一组唯一字符和它们出现的次数，例如BANANA -> (3, A), (1, B), (2, N)。这可以通过对字符串进行排序并分组字母来完成。然后，对于集合中的每个字母，在该字母少一个的所有排列前加上该字母（注意递归）。继续使用“BANANA”示例，我们有：permutations((3,A),(1,B),(2,N)) = A:(permutations((2,A),(1,B),(2,N)) ++ B:(permutations((3,A),(2,N)) ++ N:(permutations((3,A),(1,B),(1,N))

以下是Haskell的实现：

circularPermutations::[a]->[[a]]
circularPermutations xs = helper [] xs []
                          where helper acc [] _ = acc
                                helper acc (x:xs) ys =
                                  helper (((x:xs) ++ ys):acc) xs (ys ++ [x])

nrPermutations::[(Int, a)]->[[a]]
nrPermutations x | length x == 1 = [take (fst (head x)) (repeat (snd (head x)))]
nrPermutations xs = concat (map helper (circularPermutations xs))
  where helper ((1,x):xs) = map ((:) x)(nrPermutations xs)
        helper ((n,x):xs) = map ((:) x)(nrPermutations ((n - 1, x):xs))

- gcbenison

3

我认为这个问题本质上是产生多重集排列的问题。这篇论文似乎与此相关: J. F. Korsh P. S. LaFollette. Loopless array generation of multiset permutations. The Computer Journal, 47(5):612–621, 2004。

从摘要中可以看出，这篇论文提出了一种无循环算法来生成多重集的所有排列。每个排列都通过一个交换操作从前一个排列获得。它与以前的算法不同之处在于使用一个数组来存储排列，但只需要线性长度的存储空间。

- Kris

6

尝试自己写一下呢？ - Gangnus

1

在我的解决方案中，我递归生成选项，每次尝试添加我尚未使用的每个字母，直到我需要的次数为止。

#include <string.h>

void fill(char ***adr,int *pos,char *pref) {
    int i,z=1;
    //loop on the chars, and check if should use them
    for (i=0;i<256;i++)
        if (pos[i]) {
            int l=strlen(pref);
            //add the char
            pref[l]=i;
            pos[i]--;
            //call the recursion
            fill(adr,pos,pref);
            //delete the char
            pref[l]=0;
            pos[i]++;
            z=0;
        }
    if (z) strcpy(*(*adr)++,pref);
}

void calc(char **arr,const char *str) {
    int p[256]={0};
    int l=strlen(str);
    char temp[l+1];
    for (;l>=0;l--) temp[l]=0;
    while (*str) p[*str++]++;
    fill(&arr,p,temp);
}

使用示例：

#include <stdio.h>
#include <string.h>

int main() {
    char s[]="AABAF";
    char *arr[20];
    int i;
    for (i=0;i<20;i++) arr[i]=malloc(sizeof(s));
    calc(arr,s);
    for (i=0;i<20;i++) printf("%d: %s\n",i,arr[i]);
    return 0;
}

- asaelr

添加了一些注释。还有其他建议吗？ - asaelr

2

最重要的改进，甚至比注释更重要的是使用描述性的函数/变量名称。目前你有两个名为func和calc的函数，以及名为arr、pref、pos、adr、p、l、i、z、p、s和str的变量；它们的用途都不明显。使用更具描述性的变量名称将极大地提高代码的可读性。 - BlueRaja - Danny Pflughoeft

1

其他较小的改进：使用描述性类型（z 应该是 bool，#include <stdbool.h>）；不要使用魔法数字（arr 的大小，p 的大小）；永远不要使用 strcpy()；别忘了在 malloc() 上调用 free() :) - BlueRaja - Danny Pflughoeft

0

这是一个棘手的问题，我们需要使用递归来找出字符串的所有排列，例如 "AAB" 的排列将是 "AAB"、"ABA" 和 "BAA"。我们还需要使用Set来确保没有重复的值。

import java.io.*;
import java.util.HashSet;
import java.util.*;
class Permutation {

    static HashSet<String> set = new HashSet<String>();
    public static void main (String[] args) {
    Scanner in = new Scanner(System.in);
        System.out.println("Enter :");
        StringBuilder  str = new StringBuilder(in.nextLine());
        NONDuplicatePermutation("",str.toString());  //WITHOUT DUPLICATE PERMUTATION OF STRING
        System.out.println(set);
    }


    public static void NONDuplicatePermutation(String prefix,String str){
        //It is nlogn
        if(str.length()==0){
            set.add(prefix);
        }else{
            for(int i=0;i<str.length();i++){

                NONDuplicatePermutation(prefix+ str.charAt(i), str.substring(0,i)+str.substring(i+1));
            }
        }

    }

}

- mukta

我用Java编写了我的代码。我认为我的代码中给出的逻辑是相当易懂的。 - mukta

你仍在生成所有排列（包括重复的）。问题提到解决方案应该避免这样做。 - AnEnigmaticBug

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Gangnus · Accepted Answer

使用以下递归算法：

PermutList Permute(SymArray fullSymArray){
    PermutList resultList=empty;
    for( each symbol A in fullSymArray, but repeated ones take only once) {
       PermutList lesserPermutList=  Permute(fullSymArray without A)
       for ( each SymArray item in lesserPermutList){
            resultList.add("A"+item);
       }
    }
    return resultList;
}

正如您所看到的，这非常容易。