使用HDF5进行矩阵乘法

4

我正在尝试使用hdf5(pytables)在内存限制下进行2个大矩阵的乘法,但是numpy.dot函数似乎给出了错误:

Valueerror:数组太大

也许我需要自己分块进行矩阵乘法,或者有其他类似于numpy.dot的python函数吗?

import numpy as np
import time
import tables
import cProfile
import numexpr as ne

n_row=10000
n_col=100
n_batch=10

rows = n_row
cols = n_col
batches = n_batch

atom = tables.UInt8Atom()  #?
filters = tables.Filters(complevel=9, complib='blosc') # tune parameters

fileName_a = 'C:\carray_a.h5'
shape_a = (rows*batches, cols)  # predefined size

h5f_a = tables.open_file(fileName_a, 'w')
ca_a = h5f_a.create_carray(h5f_a.root, 'carray', atom, shape_a, filters=filters)

for i in range(batches):
    data = np.random.rand(rows,cols)
    ca_a[i*rows:(i+1)*rows]= data[:]
#h5f_0.close()


rows = n_col
cols = n_row
batches = n_batch

fileName_b = 'C:\carray_b.h5'
shape_b = (rows, cols*batches)  # predefined size

h5f_b = tables.open_file(fileName_b, 'w')
ca_b = h5f_b.create_carray(h5f_b.root, 'carray', atom, shape_b, filters=filters)

#need to batch by cols
sz= rows/batches
for i in range(batches):
    data = np.random.rand(sz, cols*batches)
    ca_b[i*sz:(i+1)*sz]= data[:]
#h5f_1.close()

rows = n_batch*n_row
cols = n_batch*n_row

fileName_c = 'C:\carray_c.h5'
shape_c = (rows, cols)  # predefined size

h5f_c = tables.open_file(fileName_c, 'w')
ca_c = h5f_c.create_carray(h5f_c.root, 'carray', atom, shape_c, filters=filters)


a= h5f_a.root.carray#[:]
b= h5f_b.root.carray#[:]
c= h5f_c.root.carray

t0= time.time()
c= np.dot(a,b) #error if aray is big
print (time.time()-t0)

更新:这里是代码。使用hdf5,它运行得更快。
import numpy as np
import tables
import time

sz= 100 #chunk size
n_row=10000 #m
n_col=1000 #n

#for arbitrary size
A=np.random.rand(n_row,n_col)
B=np.random.rand(n_col,n_row)
# A=np.random.randint(5, size=(n_row,n_col))
# B=np.random.randint(5, size=(n_col,n_row))

#using numpy array
#C= np.zeros((n_row,n_row))

#using hdf5
fileName_C = 'CArray_C.h5'
atom = tables.Float32Atom()
shape = (A.shape[0], B.shape[1])
Nchunk = 128  # ?
chunkshape = (Nchunk, Nchunk)
chunk_multiple = 1
block_size = chunk_multiple * Nchunk
h5f_C = tables.open_file(fileName_C, 'w')
C = h5f_C.create_carray(h5f_C.root, 'CArray', atom, shape, chunkshape=chunkshape)

sz= block_size

t0= time.time()
for i in range(0, A.shape[0], sz):
    for j in range(0, B.shape[1], sz):
        for k in range(0, A.shape[1], sz):
            C[i:i+sz,j:j+sz] += np.dot(A[i:i+sz,k:k+sz],B[k:k+sz,j:j+sz])
print (time.time()-t0)

t0= time.time()
res= np.dot(A,B)
print (time.time()-t0)

print (C== res)

h5f_C.close()
1个回答

3

我不知道有没有一种无需加载到内存中即可工作的np.dot。我认为分块会很有效。创建一个输出数组(下面称为“c”)作为pytables CArray,并填充块。在创建时应选择与您的分块方案匹配的chunkshape。例如:

atom = tables.Float32Atom() # you have UInt8Atom() above.  do you mean that?
shape = (a.shape[0], b.shape[1])

# you can vary block_size and chunkshape independently, but I would
# aim to have block_size an integer multiple of chunkshape
# your mileage may vary and depends on the array size and how you'll
# access it in the future.

Nchunk = 128  # ?
chunkshape = (Nchunk, Nchunk)
chunk_multiple = 1
block_size = chunk_multiple * Nchunk
c = h5f.create_carray(h5.root, 'c', atom, shape, chunkshape=chunkshape)

for i_start in range(0, a.shape[0], block_size):
    for j_start in range(0, b.shape[1], block_size):
        for k_start in range(0, a.shape[1], block_size):
            c[i_start:i_start+block_size, j_start:j_start + block_size] += \ 
                    np.dot(a[i_start:i_start + block_size, k_start:k_start + block_size],
                           b[k_start:k_start + block_size, j_start:j_start + block_size]

块大小必须依赖于HDF5文件的某些设置吗? - mrgloom
1
我指的是chunkshape。这是创建hdf5数组时可以传递的一个参数。它是“在单个HDF5 I/O操作中要读取或写入的数据块的形状”。 - Greg Whittier
谢谢,它有效了。请看更新。你能解释一下为什么它比numpy.dot更快吗? - mrgloom
在我的2G RAM机器上,hdf5块版本较慢,210秒对70秒(针对单个np.dot调用)。 C(10000,10000),占用760 Mb的内存,因此内存交换是np.dot时间的重要组成部分。 - hpaulj
hdf5将矩阵以块的形式存储在磁盘上。为了尽可能减少访问磁盘的次数,您需要最小化检索块的次数。调用numpy.dot可能会从磁盘中提取比必要更多的块,因为基于处理器缓存大小优化的底层BLAS例程与hdf5的块大小不同。显式执行阻塞可以确保仅在需要时才访问磁盘。这就像另一级缓存(除了在磁盘上)。@hpaulj您确定该矩阵足够大以产生影响吗? - Greg Whittier

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接