NumPy按顺序合并数组

Question

NumPy按顺序合并数组

4

我有三个不同的numpy数组

a = array([ 0,  3,  6,  9, 12])
b = array([ 1,  4,  7, 10, 13])
c = array([ 2,  5,  8, 11, 14])

我该如何使用numpy方法将它们合并在一起？

d = array[(0,1,2,3,4,...,12,13,14)]

我不想编写像这样的循环:

for i in range(len(a)):
 [...]

这只是我的项目中的一个例子，数组并没有排序，我想保留它们的顺序。

- glethien

3个回答

2

您可以使用：

d = np.vstack((a, b, c)).T.ravel()

这样做可以比使用 .flatten() 函数少保存一个副本，因此在处理大型数组时速度更快。

编辑：正如 Sven Marnach 所述，在这种情况下，这并没有节省一份副本。

由于某些原因，vstack 比 array 更快：

In [1]: a = ones(1e4)

In [2]: b = ones(1e4)

In [3]: c = ones(1e4)

In [4]: %timeit np.vstack((a, b, c)).T.ravel()
1000 loops, best of 3: 265 us per loop

In [5]: %timeit np.vstack((a, b, c)).T.flatten()
1000 loops, best of 3: 268 us per loop

In [6]: %timeit np.array((a, b, c)).T.ravel()
100 loops, best of 3: 5.24 ms per loop

In [7]: def test(a, b, c):
    d = numpy.empty((len(a), 3), dtype=a.dtype)
    d.T[:] = a, b, c
    d = d.ravel()
    return d

In [8]: %timeit test(a, b, c)
100 loops, best of 3: 5.06 ms per loop

In [9]: def test2(a, b, c):
            d = np.empty((len(a), 3), dtype=a.dtype)
            d[:, 0], d[:, 1], d[:, 2] = a, b, c
            d = d.ravel()
            return d

In [9]: %timeit test2(a, b, c)
10000 loops, best of 3: 69.8 us per loop

- Nicolas Barbey

在这种情况下，使用ravel()不会保存副本，因为数据在内存中的顺序不正确。在使用vstack()之后，无法避免进行此复制。当前答案（我的和你的）中的代码总共执行了两次复制。可以通过不同的技巧将其减少到一次复制-我将在我的答案中添加一些代码。 - Sven Marnach

我不确定为什么给 d.T[:] 赋值会这么慢。使用 d[:, 0], d[:, 1], d[:, 2] 作为赋值目标，可以在我的机器上将代码加速40倍 - 我再次更新了我的答案。 :) - Sven Marnach

你说得完全正确。不幸的是，你的新实现并没有比第一个vstack + flatten / ravel更快。 - Nicolas Barbey

确实，你的最新版本很棒 :) - Nicolas Barbey

1

尝试一下...

reduce (numpy.union1d, (a, b, c))

- doc Alexander

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Sven Marnach · Accepted Answer

您可以转置和扁平化数组：

d = numpy.array([a, b, c]).T.flatten()

另一种组合数组的方法是使用numpy.vstack()：

d = numpy.vstack((a, b, c)).T.flatten()

我不知道哪个更快，顺便说一下。

编辑: 回应Nicolas Barbey的答案，以下是如何仅拷贝数据一次：

d = numpy.empty((len(a), 3), dtype=a.dtype)
d[:, 0], d[:, 1], d[:, 2] = a, b, c
d = d.ravel()

这段代码确保数据的布局方式不需要进行复制，以便ravel()函数可以正常运行，实际上在我的电脑上比原始代码快得多：

In [1]: a = numpy.arange(0, 30000, 3)
In [2]: b = numpy.arange(1, 30000, 3)
In [3]: c = numpy.arange(2, 30000, 3)
In [4]: def f(a, b, c):
   ...:     d = numpy.empty((len(a), 3), dtype=a.dtype)
   ...:     d[:, 0], d[:, 1], d[:, 2] = a, b, c
   ...:     return d.ravel()
   ...: 
In [5]: def g(a, b, c):
   ...:     return numpy.vstack((a, b, c)).T.ravel()
   ...: 
In [6]: %timeit f(a, b, c)
10000 loops, best of 3: 34.4 us per loop
In [7]: %timeit g(a, b, c)
10000 loops, best of 3: 177 us per loop