使用Numpy将数组分成N个块

Question

使用Numpy将数组分成N个块

108

有一个针对分割列表为均匀大小的块的问题。是否有更高效的方法，可以使用Numpy来处理大型数组？

- Eiyrioü von Kauyf

我们应该将这个问题的输入解释为native Python array，还是numpy ndarray？第一句话似乎意味着前者。第二句话暗示它正在要求比较前者和后者。只有二维，大概是这样。当我们说“高效地...用于巨大数组”时，我们更关心渐近大N的可扩展性，而不管它在小N时是否更慢？ - smci

7个回答

26

使用array_split，split，hsplit和vsplit的一些示例：

n [9]: a = np.random.randint(0,10,[4,4])

In [10]: a
Out[10]: 
array([[2, 2, 7, 1],
       [5, 0, 3, 1],
       [2, 9, 8, 8],
       [5, 7, 7, 6]])

使用array_split的一些示例：
如果您将数组或列表作为第二个参数，则基本上给出索引（之前）来“切割”数组。

# split rows into 0|1 2|3
In [4]: np.array_split(a, [1,3])
Out[4]:                                                                                                                       
[array([[2, 2, 7, 1]]),                                                                                                       
 array([[5, 0, 3, 1],                                                                                                         
       [2, 9, 8, 8]]),                                                                                                        
 array([[5, 7, 7, 6]])]

# split columns into 0| 1 2 3
In [5]: np.array_split(a, [1], axis=1)                                                                                           
Out[5]:                                                                                                                       
[array([[2],                                                                                                                  
       [5],                                                                                                                   
       [2],                                                                                                                   
       [5]]),                                                                                                                 
 array([[2, 7, 1],                                                                                                            
       [0, 3, 1],
       [9, 8, 8],
       [7, 7, 6]])]

第二个参数为整数，指定相等的块数：

In [6]: np.array_split(a, 2, axis=1)
Out[6]: 
[array([[2, 2],
       [5, 0],
       [2, 9],
       [5, 7]]),
 array([[7, 1],
       [3, 1],
       [8, 8],
       [7, 6]])]

split的使用方法与之相同，但如果无法等量分割，则会引发异常。

除了array_split外，您还可以使用快捷方式vsplit和hsplit。
vsplit和hsplit几乎是不言自明的：

In [11]: np.vsplit(a, 2)
Out[11]: 
[array([[2, 2, 7, 1],
       [5, 0, 3, 1]]),
 array([[2, 9, 8, 8],
       [5, 7, 7, 6]])]

In [12]: np.hsplit(a, 2)
Out[12]: 
[array([[2, 2],
       [5, 0],
       [2, 9],
       [5, 7]]),
 array([[7, 1],
       [3, 1],
       [8, 8],
       [7, 6]])]

- tzelleke

10

我相信你正在寻找numpy.split，或者是可能需要将数组分成不完全整除的部分，那么可以考虑使用numpy.array_split。

- mgilson

10

虽然这不是一个答案，但是对其他（正确的）答案进行了长篇评论，并且代码格式很好。如果您尝试以下操作，您将看到您得到的是原始数组的视图，而不是副本，这与您链接的问题中被接受的答案不同。请注意可能的副作用！

>>> x = np.arange(9.0)
>>> a,b,c = np.split(x, 3)
>>> a
array([ 0.,  1.,  2.])
>>> a[1] = 8
>>> a
array([ 0.,  8.,  2.])
>>> x
array([ 0.,  8.,  2.,  3.,  4.,  5.,  6.,  7.,  8.])
>>> def chunks(l, n):
...     """ Yield successive n-sized chunks from l.
...     """
...     for i in xrange(0, len(l), n):
...         yield l[i:i+n]
... 
>>> l = range(9)
>>> a,b,c = chunks(l, 3)
>>> a
[0, 1, 2]
>>> a[1] = 8
>>> a
[0, 8, 2]
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8]

- Jaime

2

np.array_split函数会尝试“均匀地”分割数组，例如，如果x.shape为10，sections为3，则您将获得形状为[3, 3, 2, 2]的分割，而不是[3, 3, 3, 1]。一种解决方法是使用像下面代码片段中的间隔索引。

import math
import numpy as np


def split_evenly(x, chunk_size, axis=0):
    return np.array_split(x, math.ceil(x.shape[axis] / chunk_size), axis=axis)


def split_reminder(x, chunk_size, axis=0):
    indices = np.arange(chunk_size, x.shape[axis], chunk_size)
    return np.array_split(x, indices, axis)


x = np.arange(10)
chunk_size = 3
print([i.shape[0] for i in split_evenly(x, chunk_size, 0)])
print([i.shape[0] for i in split_reminder(x, chunk_size, 0)])
# [3, 3, 2, 2]
# [3, 3, 3, 1]

- Jackiexiao

0

这可以通过使用numpy的as_strided来实现。我假设如果块大小不是总行数的因子，则最后一个批次中其余的行将填充为零。

from numpy.lib.stride_tricks import as_strided
def batch_data(test, chunk_count):
  m,n = test.shape
  S = test.itemsize
  if not chunk_count:
    chunk_count = 1
  batch_size = m//chunk_count
# Batches which can be covered fully
  test_batches = as_strided(test, shape=(chunk_count, batch_size, n), strides=(batch_size*n*S,n*S,S)).copy()
  covered = chunk_count*batch_size
  if covered < m:
    rest = test[covered:,:]
    rm, rn = rest.shape
    mismatch = batch_size - rm
    last_batch = np.vstack((rest,np.zeros((mismatch,rn)))).reshape(1,-1,n)
    return np.vstack((test_batches,last_batch))
  return test_batches

这是基于我的回答https://dev59.com/9Yfca4cB1Zd3GeqPodlu#68238815。

- MSS

0

这样怎么样？在这里，您可以使用所需的长度拆分数组。

a = np.random.randint(0,10,[4,4])

a
Out[27]: 
array([[1, 5, 8, 7],
       [3, 2, 4, 0],
       [7, 7, 6, 2],
       [7, 4, 3, 0]])

a[0:2,:]
Out[28]: 
array([[1, 5, 8, 7],
       [3, 2, 4, 0]])

a[2:4,:]
Out[29]: 
array([[7, 7, 6, 2],
       [7, 4, 3, 0]])

- Nilani Algiriyage

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Prashant Kumar · Accepted Answer

尝试使用numpy.array_split。

来自文档：

>>> x = np.arange(8.0)
>>> np.array_split(x, 3)
    [array([ 0.,  1.,  2.]), array([ 3.,  4.,  5.]), array([ 6.,  7.])]

与numpy.split相同，但如果组的长度不相等，则不会引发异常。

如果块数> len(array)，则会得到嵌套的空数组。要解决这个问题-如果您的拆分数组保存在a中，则可以通过以下方式删除空数组:

[x for x in a if x.size > 0]

如果你愿意，只需将其保存回a中即可。