Numba - 如何并行填充二维数组

Question

Numba - 如何并行填充二维数组

4

我有一个函数，作用于float64(x,y)的2D矩阵。基本概念：对于每一行组合（行数选择2），计算减法后的正值数量（row1-row2）。在int64(y,y)的2D矩阵中，如果该值超过某个阈值，则将其存储在索引[row1,row2]中，如果低于阈值，则存储在索引[row2,row1]中。

我已经实现了这个函数，并使用@njit(parallel=False)进行了修饰，这可以正常工作。但是@njit(parallel=True)似乎没有加速。为了加速整个过程，我看了一下@guvectorize，它也可以工作。然而，在这种情况下，我无法弄清楚如何在parallel true的情况下使用@guvectorize。

我看了numba guvectorize target='parallel' slower than target='cpu'，解决方法是使用@vecorize代替，但我无法将解决方案转移到我的问题上，因此现在正在寻求帮助 :)

基本jitted和guvectorized实现

import numpy as np
from numba import jit, guvectorize, prange
import timeit

@jit(parallel=False)
def check_pairs_sg(raw_data):
    # 2D array to be filled
    result = np.full((len(raw_data), len(raw_data)), -1)

    # Iterate over all possible gene combinations
    for r1 in range(0, len(raw_data)):
        for r2 in range(r1+1, len(raw_data)):
            diff = np.subtract(raw_data[:, r1], raw_data[:, r2])

            num_pos = len(np.where(diff > 0)[0])

            # Arbitrary check to illustrate
            if num_pos >= 5: 
               result[r1,r2] = num_pos
            else:
               result[r2,r1] = num_pos

    return result

@jit(parallel=True)
def check_pairs_multi(raw_data):
    # 2D array to be filled
    result = np.full((len(raw_data), len(raw_data)), -1)

    # Iterate over all possible gene combinations
    for r1 in range(0, len(raw_data)):
        for r2 in prange(r1+1, len(raw_data)):
            diff = np.subtract(raw_data[:, r1], raw_data[:, r2])

            num_pos = len(np.where(diff > 0)[0])

            # Arbitrary check to illustrate
            if num_pos >= 5: 
               result[r1,r2] = num_pos
            else:
               result[r2,r1] = num_pos

    return result

@guvectorize(["void(float64[:,:], int64[:,:])"],
             "(n,m)->(m,m)", target='cpu')
def check_pairs_guvec_sg(raw_data, result):
    for r1 in range(0, len(result)):
        for r2 in range(r1+1, len(result)):
            diff = np.subtract(raw_data[:, r1], raw_data[:, r2])

            num_pos = len(np.where(diff > 0)[0])

            # Arbitrary check to illustrate
            if num_pos >= 5: 
               result[r1,r2] = num_pos
            else:
               result[r2,r1] = num_pos

@guvectorize(["void(float64[:,:], int64[:,:])"],
             "(n,m)->(m,m)", target='parallel')
def check_pairs_guvec_multi(raw_data, result):
    for r1 in range(0, len(result)):
        for r2 in range(r1+1, len(result)):
            diff = np.subtract(raw_data[:, r1], raw_data[:, r2])

            num_pos = len(np.where(diff > 0)[0])

            # Arbitrary check to illustrate
            if num_pos >= 5: 
               result[r1,r2] = num_pos
            else:
               result[r2,r1] = num_pos

if __name__=="__main__":
     np.random.seed(404)
     a = np.random.random((512,512)).astype(np.float64)
     res = np.full((len(a), len(a)), -1)

并且使用

进行测量

%timeit check_pairs_sg(a)
%timeit check_pairs_multi(a)
%timeit check_pairs_guvec_sg(a, res)
%timeit check_pairs_guvec_multi(a, res)

导致：

614 ms ± 2.54 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
507 ms ± 6.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
622 ms ± 3.88 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
671 ms ± 4.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

我无法理解如何以 @vectorized 或适当的并行 @guvectorize 实现此操作，填充生成的 2D 数组并发。

我想这是在尝试将其进一步转移到 GPU 之前的第一步。

非常感谢任何帮助。

- iR0Nic

我不清楚该如何运行这个程序来尝试做那件事，因为你有两个 if __name__ == '__main__' 保护。 - roganjosh

我可以使用itertools来生成一个接一个的组合元组，以便只有一个循环，但这有什么帮助呢？PS：我删除了第二个主要部分。 - iR0Nic

1

事实上，我相信大部分都可以进行向量化处理，省略一些“for”循环和“if”检查。在确保numpy方法适用之前就跳到numba和GPU是忽视了问题。 - roganjosh

我没有设置种子。即使没有种子，它仍然给出相同的答案，因此结果是确定性的，不受任何随机输入的影响。 - roganjosh

test_a = np.random.random((10,10)).astype(np.float64);
res_a = check_pairs_sg(test_a);
test_b = np.random.random((10,10)).astype(np.float64);
res_b = check_pairs_sg(test_b);
print(np.allclose(test_a,test_b)) // False;
print(np.allclose(res_a,res_b)) // False;

抱歉，我无法理解您的意思。该算法是确定性的：相同的输入会得到相同的输出；其他随机输入会得到其他随机输出。 - iR0Nic

显示剩余12条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- max9111 · Accepted Answer

写Numba代码时，请考虑其他编译语言

例如，考虑实现以下行的更加精准的等效实现：

diff = np.subtract(raw_data[:, r1], raw_data[:, r2])
num_pos = len(np.where(diff > 0)[0])

在C++中。

伪代码

分配一个数组diff，循环遍历raw_data[i * size_dim_1 + r1]（循环索引为i）
分配一个布尔数组，循环遍历整个数组diff并检查是否diff[i]>0
循环遍历布尔数组，获取b_arr == True的索引，并通过vector :: push_back（）将它们保存到向量中。
检查向量的大小

您的代码中的主要问题是：

为简单操作创建临时数组
非连续内存访问

代码优化

去除临时数组和简化

@nb.njit(parallel=False)
def check_pairs_simp(raw_data):
    # 2D array to be filled
    result = np.full((raw_data.shape[0],raw_data.shape[1]), -1)
    
    # Iterate over all possible gene combinations
    for r1 in range(0, raw_data.shape[1]):
        for r2 in range(r1+1, raw_data.shape[1]):
            num_pos=0
            for i in range(raw_data.shape[0]):
                if (raw_data[i,r1]>raw_data[i,r2]):
                    num_pos+=1
            
            # Arbitrary check to illustrate
            if num_pos >= 5: 
               result[r1,r2] = num_pos
            else:
               result[r2,r1] = num_pos
    
    return result

去除临时数组、简化过程和连续内存访问

@nb.njit(parallel=False)
def check_pairs_simp_rev(raw_data_in):
    #Create a transposed array not just a view 
    raw_data=np.ascontiguousarray(raw_data_in.T)
    
    # 2D array to be filled
    result = np.full((raw_data.shape[0],raw_data.shape[1]), -1)
    
    # Iterate over all possible gene combinations
    for r1 in range(0, raw_data.shape[0]):
        for r2 in range(r1+1, raw_data.shape[0]):
            num_pos=0
            for i in range(raw_data.shape[1]):
                if (raw_data[r1,i]>raw_data[r2,i]):
                    num_pos+=1
            
            # Arbitrary check to illustrate
            if num_pos >= 5: 
               result[r1,r2] = num_pos
            else:
               result[r2,r1] = num_pos
    
    return result

去除临时数组和简化 + 连续内存访问 + 并行化

@nb.njit(parallel=True,fastmath=True)
def check_pairs_simp_rev_p(raw_data_in):
    #Create a transposed array not just a view 
    raw_data=np.ascontiguousarray(raw_data_in.T)
    
    # 2D array to be filled
    result = np.full((raw_data.shape[0],raw_data.shape[1]), -1)
    
    # Iterate over all possible gene combinations
    for r1 in nb.prange(0, raw_data.shape[0]):
        for r2 in range(r1+1, raw_data.shape[0]):
            num_pos=0
            for i in range(raw_data.shape[1]):
                if (raw_data[r1,i]>raw_data[r2,i]):
                    num_pos+=1
            
            # Arbitrary check to illustrate
            if num_pos >= 5: 
               result[r1,r2] = num_pos
            else:
               result[r2,r1] = num_pos
    
    return result

时间

%timeit check_pairs_sg(a)
488 ms ± 8.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit check_pairs_simp(a)
186 ms ± 3.83 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit check_pairs_simp_rev(a)
12.1 ms ± 226 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit check_pairs_simp_rev_p(a)
5.43 ms ± 49.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)