在numpy/scipy中如何将for循环向量化？

Question

在numpy/scipy中如何将for循环向量化？

pythonoptimizationnumpyscipyvectorization

5

我正在尝试将一个类方法中的for循环向量化。for循环的形式如下：它遍历一堆点，并根据一个名为“self.condition_met”的变量是否为真，调用一对函数来处理该点，并将结果添加到列表中。这里的每个点都是向量列表的一个元素，即一个数据结构，看起来像array([[1,2,3], [4,5,6], ...])。以下是有问题的函数：

def myClass:
   def my_inefficient_method(self):
       final_vector = []
       # Assume 'my_vector' and 'my_other_vector' are defined numpy arrays
       for point in all_points:
         if not self.condition_met:
             a = self.my_func1(point, my_vector)
             b = self.my_func2(point, my_other_vector)
         else:
             a = self.my_func3(point, my_vector)
             b = self.my_func4(point, my_other_vector)
         c = a + b
         final_vector.append(c)
       # Choose random element from resulting vector 'final_vector'

在调用my_inefficient_method之前，self.condition_met已经被设置了，所以每次检查它似乎是不必要的，但我不确定如何更好地编写代码。由于这里没有破坏性操作，因此似乎可以将整个过程重写为矢量化操作——这是可能的吗？有什么想法吗？

- user248237

3个回答

2

你能将 my_funcx 重写为矢量化的吗？如果可以，你可以这样做：

def myClass:
   def my_efficient_method(self):
       # Assume 'all_points', 'my_vector' and 'my_other_vector' are defined numpy arrays
       if not self.condition_met:
           a = self.my_func1(all_points, my_vector)
           b = self.my_func2(all_points, my_other_vector)
       else:
           a = self.my_func3(all_points, my_vector)
           b = self.my_func4(all_points, my_other_vector)
       final_vector = a + b
       # Choose random element from resulting vector 'final_vector'

- mtrw

0

最好还是像mtrw所说的那样去做，但如果你对向量化不确定，可以尝试在my_func上使用numpy.vectorize。

http://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html

- Casey W. Stark

2

“vectorize”函数主要是为了方便而提供的，而非为了性能。其实现本质上是一个for循环。 - endolith

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- doug · Accepted Answer

这只需要在NumPy中几行代码（其余部分仅是创建数据集、几个函数和设置）即可完成。

import numpy as NP

# create two functions 
fnx1 = lambda x : x**2
fnx2 = lambda x : NP.sum(fnx1(x))

# create some data
M = NP.random.randint(10, 99, 40).reshape(8, 5)

# creates index array based on condition satisfaction
# (is the sum (of that row/data point) even or odd)
ndx = NP.where( NP.sum(M, 0) % 2 == 0 )

# only those data points that satisfy the condition (are even) 
# are passed to one function then another and the result off applying both 
# functions to each data point is stored in an array
res = NP.apply_along_axis( fnx2, 1, M[ndx,] )

print(res)
# returns: [[11609 15309 15742 12406  4781]]

根据您的描述，我总结出以下流程：

检查条件（布尔值）是否为“True”
对满足条件的数据点（行）调用一对函数
将每组调用的结果附加到列表中（下面是“res”）