寻找带有重复字母的单词排列的排名

7
我发布这篇文章,尽管已经有很多关于这个问题的帖子了。我不想发表答案,因为它不起作用。这篇文章的答案 (在具有重复项的所有可能排列列表中查找给定字符串的排名) 对我没有用。
所以我尝试了这个(这是我抄袭的代码和处理重复的尝试的汇编)。不重复的情况下可以正常工作。 BOOKKEEPER 生成83863,而不是期望的10743。
(阶乘函数和字母计数数组“repeats”正在正确运行。我没有发布是为了节省空间。)
while (pointer != length)
{
    if (sortedWordChars[pointer] != wordArray[pointer])
    {
        // Swap the current character with the one after that
        char temp = sortedWordChars[pointer];
        sortedWordChars[pointer] = sortedWordChars[next];
        sortedWordChars[next] = temp;
        next++;

        //For each position check how many characters left have duplicates, 
        //and use the logic that if you need to permute n things and if 'a' things 
        //are similar the number of permutations is n!/a!


        int ct = repeats[(sortedWordChars[pointer]-64)];
        // Increment the rank
        if (ct>1) { //repeats?
            System.out.println("repeating " + (sortedWordChars[pointer]-64));
            //In case of repetition of any character use: (n-1)!/(times)!
            //e.g. if there is 1 character which is repeating twice,
            //x* (n-1)!/2!                      
                int dividend = getFactorialIter(length - pointer - 1);
                int divisor = getFactorialIter(ct);
                int quo = dividend/divisor;
                rank += quo;
        } else {
            rank += getFactorialIter(length - pointer - 1);                 
        }                       
    } else
    {
        pointer++;
        next = pointer + 1;
    }
}

我猜你想要字典序排名? - David Eisenstat
是的,David - 例如 QUESTION=24572(在我的代码中工作,因为没有重复项)。感谢您的回复。 - Max Tomlinson
6个回答

10
注意:本答案适用于从示例中隐含的基于1的排名。以下是一些在提供的两个示例中至少有效的Python示例代码。关键事实是suffixperms * ctr[y] // ctr[x]是长度为(i + 1)perm后缀的第一个字母为y的排列数量。
from collections import Counter

def rankperm(perm):
    rank = 1
    suffixperms = 1
    ctr = Counter()
    for i in range(len(perm)):
        x = perm[((len(perm) - 1) - i)]
        ctr[x] += 1
        for y in ctr:
            if (y < x):
                rank += ((suffixperms * ctr[y]) // ctr[x])
        suffixperms = ((suffixperms * (i + 1)) // ctr[x])
    return rank
print(rankperm('QUESTION'))
print(rankperm('BOOKKEEPER'))

Java版本:

public static long rankPerm(String perm) {
    long rank = 1;
    long suffixPermCount = 1;
    java.util.Map<Character, Integer> charCounts =
        new java.util.HashMap<Character, Integer>();
    for (int i = perm.length() - 1; i > -1; i--) {
        char x = perm.charAt(i);
        int xCount = charCounts.containsKey(x) ? charCounts.get(x) + 1 : 1;
        charCounts.put(x, xCount);
        for (java.util.Map.Entry<Character, Integer> e : charCounts.entrySet()) {
            if (e.getKey() < x) {
                rank += suffixPermCount * e.getValue() / xCount;
            }
        }
        suffixPermCount *= perm.length() - i;
        suffixPermCount /= xCount;
    }
    return rank;
}

排列的反序列化:

from collections import Counter

def unrankperm(letters, rank):
    ctr = Counter()
    permcount = 1
    for i in range(len(letters)):
        x = letters[i]
        ctr[x] += 1
        permcount = (permcount * (i + 1)) // ctr[x]
    # ctr is the histogram of letters
    # permcount is the number of distinct perms of letters
    perm = []
    for i in range(len(letters)):
        for x in sorted(ctr.keys()):
            # suffixcount is the number of distinct perms that begin with x
            suffixcount = permcount * ctr[x] // (len(letters) - i)
            if rank <= suffixcount:
                perm.append(x)
                permcount = suffixcount
                ctr[x] -= 1
                if ctr[x] == 0:
                    del ctr[x]
                break
            rank -= suffixcount
    return ''.join(perm)

谢谢您的快速回复,David!让我找一顶Python帽子(我不懂Python),并从这个优雅的代码中理解一些意义。我会发布更新的。再次感谢,Max。 - Max Tomlinson
有点让我困惑的是for循环的结束位置(隐含括号),所以for循环包含了直到返回等级的部分。通过字符串perm的索引实际上是从末尾到开头(对吧)?计数器在每次迭代中都会增加,而“for y”循环则会针对每次迭代进行,就像是动态阶乘? - Max Tomlinson
@MaxTomlinson 是的。当rank被更新时,numer * (i + 1) // denom是由perm的最后i + 1个元素组成的切片的排列数。numer * ctr[y] // denom是具有与perm相同长度的前缀len(perm) - 1 - i的排列数,后跟y,然后是剩下的任意顺序。现在我想起来了,可以将numerdenom合并为一个变量。 - David Eisenstat
1
@David Eisenstat,非常棒的解决方案!但是你的代码无法处理非常大的字符串,比如'adsfadfjkzcvzadfadfadfasdfqq',因为int和long会溢出。我知道,解决这个问题的方法之一是使用模乘逆元,但我不知道在找到排名问题时应该如何使用它。也许你有关于处理大字符串的想法,或者知道如何在这种情况下使用模乘逆元? - ivan_ochc
1
@David Eisenstat,感谢您的回复。您是指将此行中的xCount替换掉吗:rank += suffixPermCount * e.getValue() / xCount; 还是将其从 suffixPermCount /= xCount; 中替换掉?但更重要的是,我不知道在计算模乘逆时需要使用哪些变量(“模数”应该使用什么值,“a”和“b”将是什么)。我查阅了一些与此问题相关的数学知识,但我不明白它应该如何用于找到排名问题(具有大输入字符串)。 - ivan_ochc
显示剩余12条评论

3
如果我们使用数学,复杂性将会降低,我们将能够更快地找到等级。这对于大字符串尤其有帮助。 (更多细节可以在这里找到)
建议以编程方式定义所示方法(以下给出屏幕截图),如下所示enter image description here

很酷,但如何从数字排名中检索单词? - Lucas Sousa

1

我认为David的帖子(被接受的答案)非常棒。但是,我希望进一步提高速度。内部循环试图找到逆序对,并且对于每个这样的逆序对,它尝试对排名的增量做出贡献。如果我们在该位置使用有序映射结构(二叉搜索树或BST),则可以从第一个节点(左下角)开始进行中序遍历,直到到达BST中的当前字符,而不是遍历整个映射(BST)。在C ++中,std :: map是BST实现的完美选择。以下代码减少了循环中必要的迭代并删除了if检查。

long long rankofword(string s)
{
    long long rank = 1;
    long long suffixPermCount = 1;
    map<char, int> m;
    int size = s.size();
    for (int i = size - 1; i > -1; i--)
    {
        char x = s[i];
        m[x]++;
        for (auto it = m.begin(); it != m.find(x); it++)
                rank += suffixPermCount * it->second / m[x];

        suffixPermCount *= (size - i);
        suffixPermCount /= m[x];
    }
    return rank;
}

0
如果有k个不同的字符,第i个字符重复n_i次,则排列的总数为
            (n_1 + n_2 + ..+ n_k)!
------------------------------------------------ 
              n_1! n_2! ... n_k!

这是多项式系数。
现在我们可以使用它来计算给定排列的秩,方法如下:

考虑第一个字符(最左边的字符)。假设它是按字符排序后的第r个。

现在,如果您将第一个字符替换为1、2、3、..、(r-1)个字符中的任何一个,并考虑所有可能的排列,每个这些排列都将在给定排列之前。可以使用上述公式计算总数。

一旦计算出第一个字符的数字,请固定第一个字符,并重复执行第二个字符等操作。

以下是C++实现您的问题

#include<iostream>

using namespace std;

int fact(int f) {
    if (f == 0) return 1;
    if (f <= 2) return f;
    return (f * fact(f - 1));
}
int solve(string s,int n) {
    int ans = 1;
    int arr[26] = {0};
    int len = n - 1;
    for (int i = 0; i < n; i++) {
        s[i] = toupper(s[i]);
        arr[s[i] - 'A']++;
    }
    for(int i = 0; i < n; i++) {
        int temp = 0;
        int x = 1;
        char c = s[i];
        for(int j = 0; j < c - 'A'; j++) temp += arr[j];
        for (int j = 0; j < 26; j++) x = (x * fact(arr[j]));
        arr[c - 'A']--;
        ans = ans + (temp * ((fact(len)) / x));
        len--;
    }
    return ans;
}
int main() {
    int i,n;
    string s;
    cin>>s;
    n=s.size();
    cout << solve(s,n);
    return 0;
}

0
Java中的字符串无序组合排名函数的版本:
public static String unrankperm(String letters, int rank) {
    Map<Character, Integer> charCounts = new java.util.HashMap<>();
    int permcount = 1;
    for(int i = 0; i < letters.length(); i++) {
        char x = letters.charAt(i);
        int xCount = charCounts.containsKey(x) ? charCounts.get(x) + 1 : 1;
        charCounts.put(x, xCount);

        permcount = (permcount * (i + 1)) / xCount;
    }
    // charCounts is the histogram of letters
    // permcount is the number of distinct perms of letters
    StringBuilder perm = new StringBuilder();

    for(int i = 0; i < letters.length(); i++) {
        List<Character> sorted = new ArrayList<>(charCounts.keySet());
        Collections.sort(sorted);

        for(Character x : sorted) {
            // suffixcount is the number of distinct perms that begin with x
            Integer frequency = charCounts.get(x);
            int suffixcount = permcount * frequency / (letters.length() - i); 

            if (rank <= suffixcount) {
                perm.append(x);

                permcount = suffixcount;

                if(frequency == 1) {
                    charCounts.remove(x);
                } else {
                    charCounts.put(x, frequency - 1);
                }
                break;
            }
            rank -= suffixcount;
        }
    }
    return perm.toString();
}

另请参阅用于蛮力装箱并行化的n-th排列算法


0
@Dvaid Einstat,非常感谢您的帮助。由于我还在学习我的第一门语言(C#),所以花了很长时间才弄清楚您在做什么。我将其翻译成了C#,并想提供这个解决方案,因为这个列表对我帮助很大!

谢谢!

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Text.RegularExpressions;

namespace CsharpVersion
{
    class Program
    {
        //Takes in the word and checks to make sure that the word
        //is between 1 and 25 charaters inclusive and only
        //letters are used
        static string readWord(string prompt, int high)
        {
            Regex rgx = new Regex("^[a-zA-Z]+$");
            string word;
            string result;
            do
            {
                Console.WriteLine(prompt);
                word = Console.ReadLine();
            } while (word == "" | word.Length > high | rgx.IsMatch(word) == false);
            result = word.ToUpper();
            return result;
        }

        //Creates a sorted dictionary containing distinct letters 
        //initialized with 0 frequency
        static SortedDictionary<char,int> Counter(string word)
        {
            char[] wordArray = word.ToCharArray();
            int len = word.Length;
            SortedDictionary<char,int> count = new SortedDictionary<char,int>();
           foreach(char c in word)
           {
               if(count.ContainsKey(c))
               {
               }
               else
               {
                   count.Add(c, 0);
               }

           }
           return count;
        }

        //Creates a factorial function
        static int Factorial(int n)
        {
            if (n <= 1)
            {
                return 1;
            }
            else
            {
                return n * Factorial(n - 1);
            }
        }
        //Ranks the word input if there are no repeated charaters 
        //in the word
        static Int64 rankWord(char[] wordArray)
        {
            int n = wordArray.Length; 
            Int64 rank = 1; 
            //loops through the array of letters
            for (int i = 0; i < n-1; i++) 
            { 
                int x=0; 
            //loops all letters after i and compares them for factorial calculation
                for (int j = i+1; j<n ; j++) 
                { 
                    if (wordArray[i] > wordArray[j]) 
                    {
                        x++;
                    }
                }
                rank = rank + x * (Factorial(n - i - 1)); 
             }
            return rank;
        }

        //Ranks the word input if there are repeated charaters
        //in the word
        static Int64 rankPerm(String word) 
        {
        Int64 rank = 1;
        Int64 suffixPermCount = 1;
        SortedDictionary<char, int> counter = Counter(word);
        for (int i = word.Length - 1; i > -1; i--) 
        {
            char x = Convert.ToChar(word.Substring(i,1));
            int xCount;
            if(counter[x] != 0) 
            {
                xCount = counter[x] + 1; 
            }
            else
            {
               xCount = 1;
            }
            counter[x] = xCount;
            foreach (KeyValuePair<char,int> e in counter)
            {
                if (e.Key < x)
                {
                    rank += suffixPermCount * e.Value / xCount;
                }
            }

            suffixPermCount *= word.Length - i;
            suffixPermCount /= xCount;
        }
        return rank;
        }




        static void Main(string[] args)
        {
           Console.WriteLine("Type Exit to end the program.");
           string prompt = "Please enter a word using only letters:";
           const int MAX_VALUE = 25;
           Int64 rank = new Int64();
           string theWord;
           do
           {
               theWord = readWord(prompt, MAX_VALUE);
               char[] wordLetters = theWord.ToCharArray();
               Array.Sort(wordLetters);
               bool duplicate = false;
               for(int i = 0; i< theWord.Length - 1; i++)
               {
                 if(wordLetters[i] < wordLetters[i+1])
                 {
                     duplicate = true;
                 }
               }
               if(duplicate)
               {
               SortedDictionary<char, int> counter = Counter(theWord);
               rank = rankPerm(theWord);
               Console.WriteLine("\n" + theWord + " = " + rank);
               }
               else
               {
               char[] letters = theWord.ToCharArray();
               rank = rankWord(letters);
               Console.WriteLine("\n" + theWord + " = " + rank);
               }
           } while (theWord != "EXIT");

            Console.WriteLine("\nPress enter to escape..");
            Console.Read();
        }
    }
}

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接