维度无关（通用）笛卡尔积

Question

维度无关（通用）笛卡尔积

6

我想生成相对较大数量的数组的笛卡尔积以跨越高维网格。由于高维度的限制，无法将笛卡尔积计算结果存储在内存中，而必须写入硬盘。因此，我需要在生成中间结果时访问它们。我迄今为止所做的是：

for x in xrange(0, 10):
    for y in xrange(0, 10):
        for z in xrange(0, 10):
            writeToHdd(x,y,z)

除了非常麻烦外，它还不可扩展（即需要编写与维度数量相同的循环）。我尝试使用这里提出的解决方案，但那是一种递归解决方案，因此很难在生成结果时立即获取结果。除了每个维度都有一个硬编码循环之外，是否有其他“简洁”的方法来做到这一点？

- danielvdende

3个回答

1

我想我找到了一种不错的方法，使用内存映射文件：

def carthesian_product_mmap(vectors, filename, mode='w+'):
    '''
    Vectors should be a tuple of `numpy.ndarray` vectors. You could
    also make it more flexible, and include some error checking
    '''        
    # Make a meshgrid with `copy=False` to create views
    grids = np.meshgrid(*vectors, copy=False, indexing='ij')

    # The shape for concatenating the grids from meshgrid
    shape = grid[0].shape + (len(vectors),)

    # Find the "highest" dtype neccesary
    dtype = np.result_type(*vectors)

    # Instantiate the memory mapped file
    M = np.memmap(filename, dtype, mode, shape=shape)

    # Fill the memmap with the grids
    for i, grid in enumerate(grids):
        M[...,i] = grid

    # Make sure the data is written to disk (optional?)
    M.flush()

    # Reshape to put it in the right format for Carthesian product
    return M.reshape((-1, len(vectors)))

但我想知道是否真的需要存储整个笛卡尔积（有很多数据重复）。生成所需的产品行是否不是一个选项？

- user2379410

你也可以通过广播来进行赋值，参见https://dev59.com/0Wgu5IYBdhLWcg3wso-j#11146645。 - user2379410

只是出于好奇，Numpy的memmap会比基于数据库的解决方案更优吗？我猜数据库会有一些开销来建立连接等，但我认为数据库提供了“智能”索引/压缩系统等？ - danielvdende

@danielvdende，我没有数据库方面的经验...也许当磁盘IO成为瓶颈时，它可能与Numpy一样快。 - user2379410

0

看起来你只想循环遍历任意数量的维度。我通常的解决方案是使用索引字段，增加索引并处理溢出。

例如：

n = 3 # number of dimensions
N = 1 # highest index value per dimension

idx = [0]*n
while True:
    print(idx)
    # increase first dimension
    idx[0] += 1
    # handle overflows
    for i in range(0, n-1):
        if idx[i] > N:
            # reset this dimension and increase next higher dimension
            idx[i] = 0
            idx[i+1] += 1
    if idx[-1] > N:
        # overflow in the last dimension, we are finished
        break

给出：

[0, 0, 0]
[1, 0, 0]
[0, 1, 0]
[1, 1, 0]
[0, 0, 1]
[1, 0, 1]
[0, 1, 1]
[1, 1, 1]

Numpy内置了类似的功能：ndenumerate。

- Trilarion

谢谢！回过头来看，似乎很简单，只是当时想不出来！ - danielvdende

Gareth Rees 的回答有一个优点，即它可以直接使用任意范围并使用内置函数。虽然我的回答在简单情况下也可以起作用，但我更喜欢他的回答。 - Trilarion

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Gareth Rees · Accepted Answer

在普通的Python中，您可以使用 itertools.product 生成一组可迭代对象的笛卡尔积。

>>> arrays = range(0, 2), range(4, 6), range(8, 10)
>>> list(itertools.product(*arrays))
[(0, 4, 8), (0, 4, 9), (0, 5, 8), (0, 5, 9), (1, 4, 8), (1, 4, 9), (1, 5, 8), (1, 5, 9)]

在Numpy中，您可以将numpy.meshgrid（传递sparse=True以避免在内存中扩展乘积）与numpy.ndindex结合使用：

>>> arrays = np.arange(0, 2), np.arange(4, 6), np.arange(8, 10)
>>> grid = np.meshgrid(*arrays, sparse=True)
>>> [tuple(g[i] for g in grid) for i in np.ndindex(grid[0].shape)]
[(0, 4, 8), (0, 4, 9), (1, 4, 8), (1, 4, 9), (0, 5, 8), (0, 5, 9), (1, 5, 8), (1, 5, 9)]