一个由0和1填充的NumPy二维数组的所有组合

Question

一个由0和1填充的NumPy二维数组的所有组合

pythonnumpyvectorizationnumpy-ndarray

3

给定K，我需要得到所有可能的K x 2 numpy矩阵组合，使得在每个矩阵中除了两行和列中的1以外，其余元素都为0。比如当K = 5时，结果应该如下所示：

[[1,0],[0,1],[0,0],[0,0][0,0]]
[[1,0],[0,0],[0,1],[0,0][0,0]]
[[1,0],[0,0],[0,0],[0,1][0,0]]
[[1,0],[0,0],[0,0],[0,0][0,1]]
[[0,0],[1,0],[0,1],[0,0][0,0]]
[[0,0],[1,0],[0,0],[0,1][0,0]]
...等等

因此，结果数组应该是一个K x 2 x (K*(K-1)/2)的数组。我想避免循环，因为在我的具体情况下，当K足够大时（例如K = 300），这样做不是一种有效的方式。

- mariottidae

3个回答

1

这是一个非常具体的问题，但也是一个有趣的问题，我很想知道背景是什么？

您正在寻找多重集合的所有排列组合，Python的itertools目前不支持此功能。因此，最简单的解决方案是使用sympy库的多重集合工具。

以下代码在我的计算机上运行大约需要2.5分钟，对于单线程来说速度相当快。对于K=300，您将获得179700个唯一的排列组合。

（我从https://dev59.com/lW015IYBdhLWcg3w_Aug#40289807中获得了灵感）

from collections import Counter
from math import factorial, prod

import numpy as np
from sympy.utilities.iterables import multiset_permutations
from tqdm import tqdm


def No_multiset_permutations(multiset: list) -> int:
    """Calculates the No. possible permutations given a multiset.
    See: https://en.wikipedia.org/wiki/Permutation#Permutations_of_multisets

    :param multiset: List representing a multiset.
    """
    value_counts = Counter(multiset).values()
    denominator = prod([factorial(val) for val in value_counts])
    return int(factorial(len(multiset)) / denominator)


def multiset_Kx2_permutations(K: int) -> np.ndarray:
    """This will generate all possible unique Kx2 permutations of an array
    withsize K where two values are 1 and the rest are 0.

    :param K: The size of the array.
    """
    # Construct number multiset, e.g. K=5 gives [1, 1, 0, 0, 0, 0, 0, 0, 0, 0]
    numbers = [1, 1] + [0] * (K - 1) * 2

    # Use sympy's multiset_permutations to get a multiset permutation generator
    generator = multiset_permutations(numbers)

    # Calculate the No. possible permutations
    number_of_perms = No_multiset_permutations(numbers)

    # Get all permutations, bonus progress bar is included :)
    unique_perms = [next(generator) for _ in tqdm(range(number_of_perms))]

    # Reshape each permutation to Kx2
    unique_perms = np.array(unique_perms, dtype=np.int8)
    return unique_perms.reshape(-1, K, 2)


if __name__ == "__main__":
    solution = multiset_Kx2_permutations(300)

- CB Madsen

1

另一种可能性（通过重新排列轴以获得更清晰的输出）：

from itertools import combinations
import numpy as np

k = 4
x = list(combinations(range(k), 2))
out = np.zeros((n := len(x), k, 2), dtype=int)
out[np.c_[:n], x, [0, 1]] = 1
print(out)

它给出：

[[[1 0]
  [0 1]
  [0 0]
  [0 0]]

 [[1 0]
  [0 0]
  [0 1]
  [0 0]]

 [[1 0]
  [0 0]
  [0 0]
  [0 1]]

 [[0 0]
  [1 0]
  [0 1]
  [0 0]]

 [[0 0]
  [1 0]
  [0 0]
  [0 1]]

 [[0 0]
  [0 0]
  [1 0]
  [0 1]]]

- bb1

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Andras Deak -- Слава Україні · Accepted Answer

我想不出一个优雅的解决方案，但这里有一个不太优雅的纯numpy方法:

import numpy as np

def combination_matrices(K):
    # get combination indices
    i, j = np.indices((K, K))
    comb_indices = np.transpose((i < j).nonzero())  # (num_combs, 2) array where ones are
    num_combs = comb_indices.shape[0]  # K*(K-1)/2

    # create a matrix of the desired shape, first axis enumerates combinations
    matrices = np.zeros((num_combs, K, 2), dtype=int)
    # broadcasting assignment of ones
    comb_range, col_index = np.ogrid[:num_combs, :2]
    matrices[comb_range, comb_indices, col_index] = 1
    return matrices

首先，使用一个 (K, K) 形状的数组的索引来查找每个组合的索引对（这些索引编码了数组的上三角，不包括对角线）。然后，我们使用一种有点棘手的广播赋值（重型高级索引），将预分配的输出数组的每个对应元素设置为 1。

请注意，我将大小为 K*(K-1)/2 的轴放在第一位，因为在 numpy 中使用 C 连续内存布局最有意义。这样，当您取组合索引 3 的矩阵时，arr[3, ...] 将是一个形状为 (K, 2) 的连续内存块，在向量化操作中快速处理。

K = 4 的输出结果：

[[[1 0]
  [0 1]
  [0 0]
  [0 0]]

 [[1 0]
  [0 0]
  [0 1]
  [0 0]]

 [[1 0]
  [0 0]
  [0 0]
  [0 1]]

 [[0 0]
  [1 0]
  [0 1]
  [0 0]]

 [[0 0]
  [1 0]
  [0 0]
  [0 1]]

 [[0 0]
  [0 0]
  [1 0]
  [0 1]]]