解释 Python 模块 itertools 的组合函数

7
我经常在Python中使用itertools模块,但如果不知道其背后的逻辑就感觉像是作弊一样。

以下是用于寻找无序字符串组合的代码。

def combinations(iterable, r):
    # combinations('ABCD', 2) --> AB AC AD BC BD CD
    # combinations(range(4), 3) --> 012 013 023 123
    pool = tuple(iterable)
    n = len(pool)
    if r > n:
        return
    indices = list(range(r))
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != i + n - r:
                break
        else:
            return
        indices[i] += 1
        for j in range(i+1, r):
            indices[j] = indices[j-1] + 1
        yield tuple(pool[i] for i in indices)

请问有人能解释基本概念吗?尤其是第14行的内容。


第14行是一个else语句。你指的是哪一行? - Padraic Cunningham
1
你说“这是我的代码”,但它是直接复制了这里的例子:https://docs.python.org/2/library/itertools.html#itertools.combinations - Tom Dalton
7
如果他不知道它是如何工作的,那显然这不是他的代码。他的意思是“这是基于此问题的代码”。 - DBedrenko
3个回答

6
def combinations(iterable, r):
    # combinations('ABCD', 2) --> AB AC AD BC BD CD
    # combinations(range(4), 3) --> 012 013 023 123
    pool = tuple(iterable)
    # first you create a tuple of the original input which you can refer later with 
    # the corresponding indices
    n = len(pool)
    # get the length of the tuple
    if r > n:
        return
    # if the length of the desired permutation is higher than the length of the tuple 
    # it is not possible to create permutations so return without doing something

    indices = list(range(r))
    # create the first list of indices in normal order ( indices = [0,1,2,3,...,r])
    # up to the desired range r

    yield tuple(pool[i] for i in indices)
    # return the first permutation which is a tuple of the input with the original 
    # indices up to r tuple(tuple[0], tuple[1],....,tuple[r])

    while True:
        for i in reversed(range(r)):
            # i will go from r-1, r-2, r-3, ....,0

            if indices[i] != i + n - r:
                # if condition is true except for the case 
                # that at the position i in the tuple the last possible 
                # character appears then it is equal and proceed with the character 
                # before which means that this character is replaced by the next 
                # possible one

                # example: tuple='ABCDE' so n = 5, r=3 indices is [0,1,2] at start i=2
                # yield (A,B,C)
                # indices[i] is 2 and checks if 2 != 4 (2 +5-3) is true and break
                # increase indices[i]+1 and yield (A,B,D)
                # indices[i] is 3 and checks if 3 != 4 (2 +5-3) is true and break
                # increase indices[i]+1 and yield (A,B,E) 
                # indices[i] is 4 and checks if 4 != 4 (2 +5-3) is false so next loop 
                # iteration:  i = 1 indices[i] is 1 and checks if 4 != 3 (1 +5-3) 
                # is true and break .... and so on

                break
        else:
            # when the forloop completely finished then all possible character 
            # combinations are processed and the function ends
            return

        indices[i] += 1 # as written proceed with the next character which means the 
                        # index at i is increased
        for j in range(i+1, r): 
            indices[j] = indices[j-1] + 1 # all the following indexes are increased as 
                                          # well since we only want to at following 
                                          # characters and not at previous one or the
                                          # same which is index at indice[i]
        yield tuple(pool[i] for i in indices)
        # return the new tuple

4
def combinations(iterable, r):
    # first, we need to understand, this function is to record every possibility of indices
    # then return the elements with the indices

    pool = tuple(iterable)

    n = len(pool)

    if r > n:
        return
    indices = list(range(r))

    # yield the first permutation, 
    # cause in the "while" circle, we will start to change the indices by plus 1 consistently
    # for example: iterable is [1, 2, 3, 4, 5], and r = 3
    # this yield will return [1, 2, 3], but in the "while" loop, 
    # we will start to update last elements' index to 4, which will return [1, 2, 4]
    yield tuple(pool[i] for i in indices)

    while True:

        # in this for loop, we want to confirm whether indices[i] can be increased or not
        for i in reversed(range(r)):

            # after reversed, i will be r-1, r-2, r-3, ....,0
            # something we should know before we start the 'for' loop
            # the value of indices[r-1] should not greater than n-1
            # the value of indices[r-2] should not greater than n-2
            # and the maximum of indices[i] should be indices[r-1]
            # so the value of indices[r-1] should between r-1 and n-r + r-1, like this:
            #       r-1 <= indics[r-1] <= n-r + r-1
            # so, to r-2:
            #       r-2 <= indics[r-1] <= n-r + r-2
            # let's set i = r-1:
            #       i <= indices[i] <= n-r+i (n-1 is the maximum value)
            # since we will keep plusing the value of indices[i], let's ignore i <= indices[i]
            # and we just want to know if indices[i] can plus or not,
            # so indices[i] can be equal with n-r+i
            # then we got:
            #       indices[i] < i + n - r
            # the offical document give: indices[i] != i + n - r,
            # cause when indices[i] == i + n - r, it arrived the boundry, 
            # the "for" loop will get into indices[i-1], there will be no judgement for ">i+n-r"
            # so to use indices[i] != i + n - r is still a good way, 
            # but i prefer indices[i] < i + n - r, which is easier to understand for me.
            # so next question is "break" in here, 
            # it means the value of indices[i] doesn't reach the boundry still can plus one,
            # let break out to continue the iteration
            # when it hit the boundry, i will be r-2
            # So we can see the result:
            # 1, 2, 3
            # 1, 2, 4
            # 1, 2, 5
            # 1, 3, 4
            # always loop the last index, hit the boundry, check the last but one.
            if indices[i] < i + n - r:
                break
        else:
            # loop finished, return
            return

        # first of all, plus one for current indices[i], 
        # that's why we yield the first permutation, before the while loop
        # and increase every indices[i] by 1
        indices[i] = indices[i] + 1
        # this for loop is increase every indices which is after indices[i].
        # cause, current index increased, and we need to confirm every one behind is orderd
        # for example: current we got i = 2, indices[i]+1 will be 3, 
        # so the next loop will start with [1, 3, 4], not [1, 3, 3]
        for j in range(i+1, r):
            indices[j] = indices[j-1] + 1

        yield tuple(pool[i] for i in indices)

2

注意!代码会被拆分并且与其各个部分的缩进不匹配,因此我建议您查看问题本身/ itertools文档中的代码(相同代码)。

自从这个问题被提出已经过去了7年。哇。我对此很感兴趣,虽然上面的解释非常有帮助,但对我来说并没有真正触及要点,所以这里是我为自己做的总结。
由于我终于理解了它(或者至少我认为我理解了),我想在这里发布这个“版本”的解释,以防还有像我一样的人。那么让我们开始吧。

def combinations(iterable, r):
    pool = tuple(iterable)
    n = len(pool) 

在这个第一部分中,只需将可迭代对象制作成元组并获取其长度。这些稍后会很有用。
if r > n:
        return
    indices = list(range(r))
    yield tuple(pool[i] for i in indices)

这也很简单 - 如果所需组合的长度大于元素池的大小,则无法构建有效的组合(例如,你不能从4个元素中组合出5个元素),因此我们只需使用返回语句停止执行。我们还生成第一个组合(从可迭代对象中取前r个元素)。
接下来的部分稍微复杂一些,请仔细阅读。
while True:
    for i in reversed(range(r)):
        if indices[i] != n - (r - i):
            break
"""
The job of the while loop is to increment the indices one after
the other, and print out all the possible element combinations based
off all the possible valid indice combinations.

This for loop's job is to make sure we never over-increment any values.

In order for us to not run into any errors, the incremention of
the last element of the indice list must stop when it reaches one-less 
than the length of our element list, otherwise we'll run into an index error 
(trying to access an indice out of the list range).
How do we do that?
            
The range function will give us values cascading down from r-1 to 0
(r-1, r-2, r-3, ... , 0)
So first and foremost, the (r-1)st indice must not be greater than (n-1)
(remember, n is the length of our element pool), as that is the largest indice. 
We can then write

Indices[r - 1] < n - 1

Moreover, because we'll stop incrementing the r-1st indice when we reach it's
maximum value, we must also stop incrementing the (r-2)nd indice when we reach
it's maximum value. What's the (r-2)nd indice maximum value?

Since we'll also be incrementing the (r-1)st indice based on the 
(r-2)nd indice, and because the maximum value of the (r-1)st 
indice is (n-1), plus we want no duplicates, the maximum value the
(r-2)nd indice can reach would be (n-2).
This trend will continue. more generally:
            
Indices[r - k] < n - k

Now, since r - k is an arbitrary index generated by the reversed range function, 
namely (i), we can substitute:

r - k = i -----> k = r - i
Indices[r - k] < n - k -----> Indices[i] < n - (r - i)
            
That's our limit - for any indice i we generate, we must break the 
increment if this inequality { Indices[i] < n - (r - i) } is no longer 
true.
(In the documentation it's written as (Indice[i] != i + n - r), and it 
means the exact same thing. I simply find this version easier to visualize 
and understand).
"""
else:
    return
"""
When our for loop runs out - which means we've gone through and 
maximized each element in our indice list - we've gone through every 
single combination, and we can exit the function.

It's important to distinct that the else statement is not linked to 
the if statement in this case, but rather to the for loop. It's a 
for-else statement, meaning "If you've finished iterating through the 
entire loop, execute the else statement".
"""

如果我们成功跳出for循环,这意味着我们可以安全地增加索引以获取下一个组合(如下面的第一行)。 下面的for循环确保每次我们从新索引开始时, 将其他索引重置为其最小可能值, 以便不会错过任何组合。
例如,如果我们不这样做,那么一旦我们到达必须移动的点, 假设我们有(0,1,2,3,4)和组合索引是(0,1,4),当我们移动并将1递增到2时, 最后一个索引将保持不变-4,我们将错过(0,2,3),只注册(0,2,4)作为有效组合。 相反,在我们增加(1->2)之后,我们根据此更新后面的索引:(4->3),当我们再次运行while循环时,我们将3递增回4(请参考前面的部分)。
请注意,我们从未增加先前的索引,以避免创建重复项。
最后,对于每个迭代,yield语句生成与当前索引组合对应的元素组合。
indices[i] += 1
for j in range(i+1, r):
    indices[j] = indices[j-1] + 1
yield tuple(pool[i] for i in indices)

正如文档所述,由于我们正在处理位置,因此唯一组合是基于元素在可迭代对象中的位置而不是它们的值而确定的。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接