使用NumPy构建两个数组的所有组合数组

Question

使用NumPy构建两个数组的所有组合数组

pythonarraysnumpymultidimensional-arraycartesian-product

209

我试图遍历一个六个参数的函数的参数空间，以研究其数值行为，然后再尝试使用它进行复杂操作，因此我正在寻找一种有效的方法来实现这一点。

我的函数接受一个6维NumPy数组中给定的浮点值作为输入。我最初尝试做的是：

首先，我创建了一个函数，该函数接受两个数组并生成一个包含两个数组中所有值组合的数组：

from numpy import *

def comb(a, b):
    c = []
    for i in a:
        for j in b:
            c.append(r_[i,j])
    return c

然后，我使用 reduce() 将其应用于 m 个相同数组的副本：

def combs(a, m):
    return reduce(comb, [a]*m)

最后，我这样评估我的函数：

values = combs(np.arange(0, 1, 0.1), 6)
for val in values:
    print F(val)

这个方法可以工作，但是速度过慢。我知道参数空间很大，但是它不应该如此缓慢。在这个例子中，我只采样了10⁶（一百万）个点，在创建数组values时花费了超过15秒的时间。

是否有更有效的使用NumPy的方法？

如果需要的话，我可以修改函数F接受其参数的方式。

- Rafael S. Calsaverini

关于最快的笛卡尔积，可以参考这个答案。（由于问题的表述与此不同，我认为这两个问题并不重复，但是两个问题的最佳解决方案是相同的。） - senderle

10个回答

187

这是一个纯NumPy实现。它比使用itertools快约5倍。

Python 3：

import numpy as np

def cartesian(arrays, out=None):
    """
    Generate a Cartesian product of input arrays.

    Parameters
    ----------
    arrays : list of array-like
        1-D arrays to form the Cartesian product of.
    out : ndarray
        Array to place the Cartesian product in.

    Returns
    -------
    out : ndarray
        2-D array of shape (M, len(arrays)) containing Cartesian products
        formed of input arrays.

    Examples
    --------
    >>> cartesian(([1, 2, 3], [4, 5], [6, 7]))
    array([[1, 4, 6],
           [1, 4, 7],
           [1, 5, 6],
           [1, 5, 7],
           [2, 4, 6],
           [2, 4, 7],
           [2, 5, 6],
           [2, 5, 7],
           [3, 4, 6],
           [3, 4, 7],
           [3, 5, 6],
           [3, 5, 7]])

    """

    arrays = [np.asarray(x) for x in arrays]
    dtype = arrays[0].dtype

    n = np.prod([x.size for x in arrays])
    if out is None:
        out = np.zeros([n, len(arrays)], dtype=dtype)

    #m = n / arrays[0].size
    m = int(n / arrays[0].size)
    out[:,0] = np.repeat(arrays[0], m)
    if arrays[1:]:
        cartesian(arrays[1:], out=out[0:m, 1:])
        for j in range(1, arrays[0].size):
        #for j in xrange(1, arrays[0].size):
            out[j*m:(j+1)*m, 1:] = out[0:m, 1:]
    return out

Python 2:

->

Python 2：


import numpy as np

def cartesian(arrays, out=None):
    arrays = [np.asarray(x) for x in arrays]
    dtype = arrays[0].dtype

    n = np.prod([x.size for x in arrays])
    if out is None:
        out = np.zeros([n, len(arrays)], dtype=dtype)

    m = n / arrays[0].size
    out[:,0] = np.repeat(arrays[0], m)
    if arrays[1:]:
        cartesian(arrays[1:], out=out[0:m, 1:])
        for j in xrange(1, arrays[0].size):
            out[j*m:(j+1)*m, 1:] = out[0:m, 1:]
    return out

- pv.

62

你是否考虑将此提交以纳入NumPy？我已经不是第一次寻找这个功能并找到了你的帖子。 - endolith

50

已纳入scikit-learn软件包，可通过from sklearn.utils.extmath import cartesian进行调用。 - Gus

3

我刚刚意识到：这与itertools.combinations稍有不同，因为此函数尊重值的顺序，而combinations则不尊重，因此此函数返回的值比combinations更多。仍然非常令人印象深刻，但不幸的是这不是我正在寻找的 :( - David Marx

9

cartesian(arrays[1:], out=out[0:m,1:]) 引发了 TypeError: slice indices must be integers or None or have an __index__ method 错误。该错误提示切片索引必须是整数、None 或具有 index 方法。 - Boern

2

@Boern或其他遇到“TypeError”的人，该函数是为Python 2编写的，您可能正在尝试使用Python 3运行它。您可以更改两行代码：m = n / arrays [0] .size-->m = int（n / arrays [0] .size）和for j in xrange（1，arrays [0] .size）：--> for j in range（1，arrays [0] .size）： - barlaensdoonn

显示剩余11条评论

43

itertools.combinations通常是从Python容器中获取组合的最快方法（如果您确实想要组合，即没有重复并且独立于顺序；这不是您的代码所做的，但我无法确定这是因为您的代码有错误还是因为您使用了错误的术语）。

如果您想要与组合不同的东西，也许itertools的其他迭代器，如product或permutations，可能更适合您。例如，您的代码看起来与以下代码大致相同：

for val in itertools.product(np.arange(0, 1, 0.1), repeat=6):
    print F(val)

所有这些迭代器生成的是元组，而不是列表或NumPy数组，因此如果您的F需要得到特定的NumPy数组，您将不得不接受在每一步中构建、清除和重新填充一个数组所带来的额外开销。

- Alex Martelli

18

你可以使用np.array(itertools.product(a, b))。

- William Song

6

将l和l2中的元素进行排列组合，并将结果转换为NumPy数组。 - ZirconCode

1

需要解释一下。这不是已经在之前的回答中涵盖了吗？有什么不同？主要思想是什么？来自帮助中心：“...始终解释为什么您提出的解决方案是合适的以及它是如何工作的”。请通过编辑（更改）您的答案进行回应，而不是在此处进行评论（****** 不要 ******使用“编辑：”，“更新：”或类似内容 - 答案应该看起来像是今天写的）。 - Peter Mortensen

12

你可以像这样做：

您可以这样做

import numpy as np

def cartesian_coord(*arrays):
    grid = np.meshgrid(*arrays)
    coord_list = [entry.ravel() for entry in grid]
    points = np.vstack(coord_list).T
    return points

a = np.arange(4)  # Fake data
print(cartesian_coord(*6*[a])

这提供了

array([[0, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 0, 0, 2],
   ...,
   [3, 3, 3, 3, 3, 1],
   [3, 3, 3, 3, 3, 2],
   [3, 3, 3, 3, 3, 3]])

- felippe

3

有没有办法让NumPy接受超过32个数组进行网格化？只要我不传递超过32个数组，这种方法对我来说就有效。 - Joelmob

11

以下 NumPy 实现应该比之前给出的答案快大约两倍：

def cartesian2(arrays):
    arrays = [np.asarray(a) for a in arrays]
    shape = (len(x) for x in arrays)

    ix = np.indices(shape, dtype=int)
    ix = ix.reshape(len(arrays), -1).T

    for n, arr in enumerate(arrays):
        ix[:, n] = arrays[n][ix[:, n]]

    return ix

- Stefan van der Walt

2

看起来不错。根据我的初步测试，对于{1,2,...,100}的所有二元组、三元组和四元组，这个答案似乎比原始答案更快。此外，对于未来想要生成{1,...,n}的所有k元组的读者，可以使用np.indices((n,...,n)).reshape(k,-1).T。 - jme

2

这仅适用于整数，而被接受的答案也适用于浮点数。 - FJC

10

看起来你想要一个网格来评估你的函数，那么你可以使用 numpy.ogrid（开放式）或 numpy.mgrid（详细版）：

import numpy

my_grid = numpy.mgrid[[slice(0, 1, 0.1)]*6]

- steabert

8

以下是使用纯NumPy的另一种方式，没有递归，没有列表推导式，也没有显式的for循环。它比原始答案慢大约20％，并且是基于np.meshgrid实现的。

def cartesian(*arrays):
    mesh = np.meshgrid(*arrays)  # Standard NumPy meshgrid
    dim = len(mesh)  # Number of dimensions
    elements = mesh[0].size  # Number of elements, any index will do
    flat = np.concatenate(mesh).ravel()  # Flatten the whole meshgrid
    reshape = np.reshape(flat, (dim, elements)).T  # Reshape and transpose
    return reshape

例如，

x = np.arange(3)
a = cartesian(x, x, x, x, x)
print(a)

提供

[[0 0 0 0 0]
 [0 0 0 0 1]
 [0 0 0 0 2]
 ...,
 [2 2 2 2 0]
 [2 2 2 2 1]
 [2 2 2 2 2]]

- étale-cohomology

7

对于一维数组（或扁平的Python列表）的笛卡尔积，只需使用meshgrid()进行纯NumPy实现，通过transpose()滚动轴并重新整形为所需的输出：

 def cartprod(*arrays):
     N = len(arrays)
     return transpose(meshgrid(*arrays, indexing='ij'),
                      roll(arange(N + 1), -1)).reshape(-1, N)

注意，这里的约定是最后一个轴变化最快（“C风格”或“行优先”）。

In [88]: cartprod([1,2,3], [4,8], [100, 200, 300, 400], [-5, -4])
Out[88]:
array([[  1,   4, 100,  -5],
       [  1,   4, 100,  -4],
       [  1,   4, 200,  -5],
       [  1,   4, 200,  -4],
       [  1,   4, 300,  -5],
       [  1,   4, 300,  -4],
       [  1,   4, 400,  -5],
       [  1,   4, 400,  -4],
       [  1,   8, 100,  -5],
       [  1,   8, 100,  -4],
       [  1,   8, 200,  -5],
       [  1,   8, 200,  -4],
       [  1,   8, 300,  -5],
       [  1,   8, 300,  -4],
       [  1,   8, 400,  -5],
       [  1,   8, 400,  -4],
       [  2,   4, 100,  -5],
       [  2,   4, 100,  -4],
       [  2,   4, 200,  -5],
       [  2,   4, 200,  -4],
       [  2,   4, 300,  -5],
       [  2,   4, 300,  -4],
       [  2,   4, 400,  -5],
       [  2,   4, 400,  -4],
       [  2,   8, 100,  -5],
       [  2,   8, 100,  -4],
       [  2,   8, 200,  -5],
       [  2,   8, 200,  -4],
       [  2,   8, 300,  -5],
       [  2,   8, 300,  -4],
       [  2,   8, 400,  -5],
       [  2,   8, 400,  -4],
       [  3,   4, 100,  -5],
       [  3,   4, 100,  -4],
       [  3,   4, 200,  -5],
       [  3,   4, 200,  -4],
       [  3,   4, 300,  -5],
       [  3,   4, 300,  -4],
       [  3,   4, 400,  -5],
       [  3,   4, 400,  -4],
       [  3,   8, 100,  -5],
       [  3,   8, 100,  -4],
       [  3,   8, 200,  -5],
       [  3,   8, 200,  -4],
       [  3,   8, 300,  -5],
       [  3,   8, 300,  -4],
       [  3,   8, 400,  -5],
       [  3,   8, 400,  -4]])

如果您想最快地更改“第一个”轴（“Fortran风格”或“列优先”），只需像这样更改reshape()的order参数： reshape((-1，N)，order ='F')。

- RBF06

2

熊猫的merge()函数提供了一个天真但快速的解决方案：

# Given the lists
x, y, z = [1, 2, 3], [4, 5], [6, 7]

# Get dataframes with the same, constant index 
x = pd.DataFrame({'x': x}, index=np.repeat(0, len(x)))
y = pd.DataFrame({'y': y}, index=np.repeat(0, len(y)))
z = pd.DataFrame({'z': z}, index=np.repeat(0, len(z)))

# Get all permutations stored in a new dataframe
df = pd.merge(x, pd.merge(y, z, left_index=True, right_index=True),
              left_index=True, right_index=True)

- simone

是的，但问题已经标记了NumPy。 - Peter Mortensen

并且标题包含“NumPy”。 - Peter Mortensen

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- CT Zhu · Accepted Answer

在新版本的NumPy（>1.8.x）中，numpy.meshgrid()提供了一个更快的实现：

对于pv的解决方案：

In [113]:

%timeit cartesian(([1, 2, 3], [4, 5], [6, 7]))
10000 loops, best of 3: 135 µs per loop
In [114]:

cartesian(([1, 2, 3], [4, 5], [6, 7]))

Out[114]:
array([[1, 4, 6],
       [1, 4, 7],
       [1, 5, 6],
       [1, 5, 7],
       [2, 4, 6],
       [2, 4, 7],
       [2, 5, 6],
       [2, 5, 7],
       [3, 4, 6],
       [3, 4, 7],
       [3, 5, 6],
       [3, 5, 7]])

numpy.meshgrid() 以前只能处理二维数据，但现在它可以处理多维数据。在这种情况下，是三维的：

In [115]:

%timeit np.array(np.meshgrid([1, 2, 3], [4, 5], [6, 7])).T.reshape(-1,3)
10000 loops, best of 3: 74.1 µs per loop
In [116]:

np.array(np.meshgrid([1, 2, 3], [4, 5], [6, 7])).T.reshape(-1,3)

Out[116]:
array([[1, 4, 6],
       [1, 5, 6],
       [2, 4, 6],
       [2, 5, 6],
       [3, 4, 6],
       [3, 5, 6],
       [1, 4, 7],
       [1, 5, 7],
       [2, 4, 7],
       [2, 5, 7],
       [3, 4, 7],
       [3, 5, 7]])

请注意，最终结果的顺序略有不同。