迭代生成自然数排列

Question

迭代生成自然数排列

8

我有一个比较不寻常的问题，可能已经被问过了（我没有找到相关内容，但是我可能使用了错误的关键词）。我的任务非常简单：给定自然数列表 [0, 1, 2, ... N - 1]，我想对这个序列进行洗牌。例如，当我输入数字4时，一种可能的结果是[3、0、1、2]。随机性应该由某个种子确定（但这在大多数常用语言中是标准的伪随机数生成器）。最朴素的方法是只实例化一个大小为N的数组，填充它并使用任何洗牌算法。问题在于，这种方法的内存复杂度为O(n)，在我的特殊情况下是不可行的。我的想法是编写一个生成器，按顺序提供生成的列表中的数字。更明确地说，我想要一种“算法”，以迭代的方式提供数字。更进一步，这个概念类会像这样：

class Generator {
   // some state
   int nextNumber(...) {
      // some magic
   }
}

通过迭代调用nextNumber方法，可以提供序列的数字（即[0, 1，... N-1]的任何排列）。当然，这个生成器实例的状态应该具有比仅O（n）更好的内存复杂度（否则我将毫无收获）。

是否有任何算法可以实现我的愿望？

- PfannkuchenXD

原根模质数是一个不错的选择。但是对于您的需求来说，这个序列可能不够随机。 - jcai

我也考虑过只使用哈希函数并在每一步返回hash(x++)，但这可能会导致冲突，我想要一个精确的排列... - PfannkuchenXD

4

https://en.m.wikipedia.org/wiki/Format-preserving_encryption - David Eisenstat

1

顺便说一句，我对我的答案中的代码进行了一些改进，并添加了一些你可能会发现有用的信息。 - PM 2Ring

3个回答

6

您要寻找的是一种伪随机置换函数，比如f，它能够以伪随机双射方式将1到N之间的数字映射到1到N之间的数字。然后，要生成伪随机置换中的第n个数字，只需返回f(n)。

这本质上与加密问题相同。带密钥的分块密码是一种伪随机双射函数。如果按某种顺序恰好向其馈送所有可能的明文块，则它将以不同的伪随机顺序恰好返回所有可能的密文块。

因此，为了解决像您这样的问题，您需要创建一个针对1到N之间的数字而非256位块等的密码。您可以使用密码学工具来完成这项任务。

例如，您可以使用Feistel结构（https://en.wikipedia.org/wiki/Feistel_cipher）构造置换函数，如下所示：

设W为floor(sqrt(N))，函数的输入为x
如果x < W^2，则将x分成两个字段，从0到W-1：h = floor(x/W)和l = x%W。哈希h以产生从0到W-1的值，并设置l =（l + hash）％W。然后交换字段--让x = l * W + h
x =（x +（N-W ^ 2））％N
重复步骤（2）和（3）若干次。做得越多，结果看起来越随机。步骤（3）确保对于许多回合，x < W ^ 2为真。

由于此函数由许多步骤组成，每个步骤都会将数字从0到N-1映射到双射方式下的0到N-1数字上，整个函数也将具有此属性。如果您向其中输入从0到N-1的数字，则会以伪随机顺序将它们返回。

- Matt Timmermans

-1

我觉得你在处理一个排列的等级（我可能错了）。我已经在 Rosetta 代码上写了一个关于这个问题的任务，并且还在其他 SO 问题上提供了答案这里和这里。

这对你有用吗？

- Paddy3118

你知道第二个链接中你的算法的内存复杂度是多少吗？ - Joseph Wood

你不确定问题是什么？你是在讨论获取真正随机的二进制位吗？ - Paddy3118

不，我是指像这样内存复杂度？ - Joseph Wood

1

抱歉造成困惑。我的问题是，生成特定排列需要多少内存？它是否取决于输入的大小？我之所以问这个问题，是因为我已经发布了一个答案（现在已删除），它正是您建议的方法（即生成第n个字典序排列）。我的算法在内存复杂度方面是O(n)。OP要求解决方案小于O(n)。 - Joseph Wood

嗯，我认为问题可能已经澄清了，上面的方法也会失败。 - Paddy3118

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- PM 2Ring · Accepted Answer

这是我在两年前编写的Python 3实现格式保留加密的相当简单的实现，使用平衡的费斯特尔网络。它可以在32位系统上执行您想要的N个索引置换，N最大可达2⁶⁴，或在64位Python版本上达到2¹²⁸。这是由hash()函数返回的整数大小决定的。请查看sys.hash_info以找到您系统的限制。使用更复杂或更慢的哈希函数返回更长的比特长度的值并不困难，但我不想让这段代码变得更加复杂或更慢。

更新

我对之前的版本进行了一些小改进，并在注释中添加了更多信息。我们现在使用哈希函数返回的高位而不是低位，这通常可以提高随机性，特别是对于短位长度。我还添加了另一个哈希函数Yann Collet的xxhash，它比Python的hash在这种情况下表现得要好得多，特别是对于较短的位长度，尽管它稍微慢一些。xxhash算法的雪崩效应比内置的hash要高得多，因此得到的排列往往更加乱序。

虽然这段代码适用于小值的stop，但更适合处理stop >= 2**16。如果您需要对较小的范围进行排列，最好只使用list(range(stop))上的random.shuffle。它会更快，并且不会使用太多RAM：list(range(2**16))在32位机器上占用约1280千字节。

你会注意到我使用了一个字符串来生成随机数种子。对于这个应用程序，我们希望随机器有足够的熵，并且使用一个大字符串（或bytes）是一个简单的方法，正如random模块文档所述。即便如此，当stop很大时，该程序只能产生所有可能排列的一小部分。例如，当stop == 35时，有35！（35的阶乘）种不同的排列方式，而35! > 2¹³²，但我们密钥的总位数仅为128位，因此无法覆盖所有这些排列方式。我们可以增加Feistel轮数以获得更多的覆盖范围，但显然对于大的stop值来说是不切实际的。

''' Format preserving encryption using a Feistel network

    This code is *not* suitable for cryptographic use.

    See https://en.wikipedia.org/wiki/Format-preserving_encryption
    https://en.wikipedia.org/wiki/Feistel_cipher
    http://security.stackexchange.com/questions/211/how-to-securely-hash-passwords

    A Feistel network performs an invertible transformation on its input,
    so each input number produces a unique output number. The netword operates
    on numbers of a fixed bit width, which must be even, i.e., the numbers
    a particular network operates on are in the range(4**k), and it outputs a
    permutation of that range.

    To permute a range of general size we use cycle walking. We set the
    network size to the next higher power of 4, and when we produce a number
    higher than the desired range we simply feed it back into the network,
    looping until we get a number that is in range.

    The worst case is when stop is of the form 4**k + 1, where we need 4
    steps on average to reach a valid n. In the typical case, where stop is
    roughly halfway between 2 powers of 4, we need 2 steps on average.

    Written by PM 2Ring 2016.08.22
'''

from random import Random

# xxhash by Yann Collet. Specialised for a 32 bit number
# See http://fastcompression.blogspot.com/2012/04/selecting-checksum-algorithm.html

def xxhash_num(n, seed):
    n = (374761397 + seed + n * 3266489917) & 0xffffffff
    n = ((n << 17 | n >> 15) * 668265263) & 0xffffffff
    n ^= n >> 15
    n = (n * 2246822519) & 0xffffffff
    n ^= n >> 13
    n = (n * 3266489917) & 0xffffffff
    return n ^ (n >> 16)

class FormatPreserving:
    """ Invertible permutation of integers in range(stop), 0 < stop <= 2**64
        using a simple Feistel network. NOT suitable for cryptographic purposes.
    """
    def __init__(self, stop, keystring):
        if not 0 < stop <= 1 << 64:
            raise ValueError('stop must be <=', 1 << 64)

        # The highest number in the range
        self.maxn = stop - 1

        # Get the number of bits in each part by rounding
        # the bit length up to the nearest even number
        self.shiftbits = -(-self.maxn.bit_length() // 2)
        self.lowmask = (1 << self.shiftbits) - 1
        self.lowshift = 32 - self.shiftbits

        # Make 4 32 bit round keys from the keystring.
        # Create an independent random stream so we
        # don't intefere with the default stream.
        stream = Random()
        stream.seed(keystring)
        self.keys = [stream.getrandbits(32) for _ in range(4)]
        self.ikeys = self.keys[::-1]

    def feistel(self, n, keys):
        # Split the bits of n into 2 parts & perform the Feistel
        # transformation on them.
        left, right = n >> self.shiftbits, n & self.lowmask
        for key in keys:
            left, right = right, left ^ (xxhash_num(right, key) >> self.lowshift)
            #left, right = right, left ^ (hash((right, key)) >> self.lowshift) 
        return (right << self.shiftbits) | left

    def fpe(self, n, reverse=False):
        keys = self.ikeys if reverse else self.keys
        while True:
            # Cycle walk, if necessary, to ensure n is in range.
            n = self.feistel(n, keys)
            if n <= self.maxn:
                return n

def test():
    print('Shuffling a small number')
    maxn = 10
    fpe = FormatPreserving(maxn, 'secret key string')
    for i in range(maxn):
        a = fpe.fpe(i)
        b = fpe.fpe(a, reverse=True)
        print(i, a, b)

    print('\nShuffling a small number, with a slightly different keystring')
    fpe = FormatPreserving(maxn, 'secret key string.')
    for i in range(maxn):
        a = fpe.fpe(i)
        b = fpe.fpe(a, reverse=True)
        print(i, a, b)

    print('\nHere are a few values for a large maxn')
    maxn = 10000000000000000000
    print('maxn =', maxn)
    fpe = FormatPreserving(maxn, 'secret key string')
    for i in range(10):
        a = fpe.fpe(i)
        b = fpe.fpe(a, reverse=True)
        print('{}: {:19} {}'.format(i, a, b))

    print('\nUsing a set to test that there are no collisions...')
    maxn = 100000
    print('maxn', maxn)
    fpe = FormatPreserving(maxn, 'secret key string')
    a = {fpe.fpe(i) for i in range(maxn)}
    print(len(a) == maxn)

    print('\nTesting that the operation is bijective...')
    for i in range(maxn):
        a = fpe.fpe(i)
        b = fpe.fpe(a, reverse=True)
        assert b == i, (i, a, b)
    print('yes')

if __name__ == "__main__":
    test()

输出

Shuffling a small number
0 4 0
1 2 1
2 5 2
3 9 3
4 1 4
5 3 5
6 7 6
7 0 7
8 6 8
9 8 9

Shuffling a small number, with a slightly different keystring
0 9 0
1 8 1
2 3 2
3 5 3
4 2 4
5 6 5
6 1 6
7 4 7
8 7 8
9 0 9

Here are a few values for a large maxn
maxn = 10000000000000000000
0: 7071024217413923554 0
1: 5613634032642823321 1
2: 8934202816202119857 2
3:  296042520195445535 3
4: 5965959309128333970 4
5: 8417353297972226870 5
6: 7501923606289578535 6
7: 1722818114853762596 7
8:  890028846269590060 8
9: 8787953496283620029 9

Using a set to test that there are no collisions...
maxn 100000
True

Testing that the operation is bijective...
yes
0 4
1 2
2 5
3 9
4 1
5 3
6 7
7 0
8 6
9 8

这是如何使用它来制作生成器的方法：

def ipermute(stop, keystring):
    fpe = FormatPreserving(stop, keystring)
    for i in range(stop):
        yield fpe.fpe(i)

for i, v in enumerate(ipermute(10, 'secret key string')):
    print(i, v)

输出

它在Python中速度相当快，但绝对不适用于密码学。通过将Feistel轮数增加至至少5轮，并使用适当的加密哈希函数（例如Blake2），可以使其达到密码级别。此外，需要使用加密方法来生成Feistel密钥。当然，除非您确切地知道自己在做什么，否则不应编写密码软件，因为编写容易受到时间攻击等漏洞的代码。