使用numpy数组优化Python函数

Question

使用numpy数组优化Python函数

pythonfunctionloopsoptimizationnumpy

3

我试图优化我写的Python脚本已经两天了。使用了几个分析工具（如cProfile，line_profiler等），我将issue缩小到以下函数。

df是一个拥有3列和1000000+行的numpy数组（数据类型为float）。使用line_profiler，我发现每当它需要访问numpy数组时，该函数花费大部分时间。

下面两行代码会占用大部分时间:

full_length += head + df[rnd_truck, 2]

full_weight += df[rnd_truck,1]

其次是这两行：

full_length = df[rnd_truck,2]

full_weight = df[rnd_truck,1]

就我所见，瓶颈是函数尝试从numpy数组中获取数字时的访问时间。

当我以MonteCarlo(df, 15., 1000.)形式运行该函数时，在i7 3.40GhZ 64位Windows机器上运行1,000,000次函数调用需时37秒。在我的应用程序中，我需要运行10亿次才能保证收敛，这将使执行时间超过一小时。我尝试使用operator.add方法来替换求和行，但完全没有帮助。看起来我必须找出一种更快的方法来访问这个numpy数组。

欢迎任何想法！

def MonteCarlo(df,head,span):
    # Pick initial truck
    rnd_truck = np.random.randint(0,len(df))
    full_length = df[rnd_truck,2]
    full_weight = df[rnd_truck,1]

    # Loop using other random truck until the bridge is full
    while 1:
        rnd_truck = np.random.randint(0,len(df))
        full_length += head + df[rnd_truck, 2]
        if full_length > span:
            break
        else:
            full_weight += df[rnd_truck,1]

    # Return average weight per feet on the bridge
    return(full_weight/span)

以下是我正在使用的df numpy数组的一部分：

In [31] df
Out[31]: 
array([[  12. ,  220.4,  108.4],
       [  11. ,  220.4,  106.2],
       [  11. ,  220.3,  113.6],
       ..., 
       [   4. ,   13.9,   36.8],
       [   3. ,   13.7,   33.9],
       [   3. ,   13.7,   10.7]])

- marillion

输入代表什么，它们的尺寸是多少？ - user2357112

3

如果我理解正确的话，这个函数中没有一个向量化操作。你可以通过一次性随机选取大量卡车，并让NumPy向量化长度和重量的加法来节省很多时间。 - user2357112

@user2357112 我不知道需要多少卡车才能通过“if full_length > span”检查来打破循环。卡车平均长度为60英尺，而“span”参数是桥梁长度。我可以做出一个有根据的猜测，因为我认为“while”循环不会被调用超过20次。您认为一次选择20辆卡车，还是每个周期中的每辆卡车都会有所不同？ - marillion

如果循环在常见情况下不会运行超过20次，我不确定它能帮助多少。但是有很大的可能性它会有所帮助。你也可以尝试并行运行1000000个模拟。 - user2357112

如果这是一个三元组列表，为什么不只获取一次三元组，然后访问其中的其他信息呢？这样只需要一次访问大列表。trip = df[rnd_truck] - Jiminion

显示剩余5条评论

4个回答

2

使用Cython编译函数可大幅提高运行时间。

在一个名为"funcs.pyx"的文件中，我有以下代码：

cimport cython
import numpy as np
cimport numpy as np


def MonteCarlo(np.ndarray[np.float_t, ndim=2] df, float head, float span):
    # Pick initial truck
    cdef int rnd_truck = np.random.randint(0,len(df))
    cdef float full_length = df[rnd_truck,2]
    cdef float full_weight = df[rnd_truck,1]
    # Loop using other random truck until the bridge is full
    while 1:
        rnd_truck = np.random.randint(0,len(df))
        full_length += head + df[rnd_truck, 2]
        if full_length > span:
            break
        else:
            full_weight += df[rnd_truck,1]
    # Return average weight per feet on the bridge
    return(full_weight/span)

除了变量前面的类型声明外，一切都是相同的。

这是我用来测试的文件:

import numpy as np
import pyximport
pyximport.install(reload_support=True, setup_args={'include_dirs':[np.get_include()]})
import funcs

def MonteCarlo(df,head,span):
    # Pick initial truck
    rnd_truck = np.random.randint(0,len(df))
    full_length = df[rnd_truck,2]
    full_weight = df[rnd_truck,1]
    # Loop using other random truck until the bridge is full
    while 1:
        rnd_truck = np.random.randint(0,len(df))
        full_length += head + df[rnd_truck, 2]
        if full_length > span:
            break
        else:
            full_weight += df[rnd_truck,1]
    # Return average weight per feet on the bridge
    return(full_weight/span)

df = np.random.rand(1000000,3)
reload(funcs)
%timeit [funcs.MonteCarlo(df, 15, 1000) for i in range(10000)]
%timeit [MonteCarlo(df, 15, 1000) for i in range(10000)]

我只运行了10000次，但即使如此，也有巨大的改进。

16:42:30: In [31]: %timeit [funcs.MonteCarlo(df, 15, 1000) for i in range(10000)]
10 loops, best of 3: 131 ms per loop

16:42:37: In [32]: %timeit [MonteCarlo(df, 15, 1000) for i in range(10000)]
1 loops, best of 3: 1.75 s per loop

- kith

2

今日免费次数已满, 请开通会员/明日再来

from multiprocessing import Pool

def RunVMC(n):
    return MonteCarlo_cy(df,head,span)


pool=Pool(processes=4)

%timeit [MonteCarlo_cy(df,15,1000) for x in range(1000000)]
1 loops, best of 3: 3.89 s per loop

#Pool @ 4
%timeit out=pool.map(RunVMC,xrange(1000000))
1 loops, best of 3: 0.973 s per loop

#Pool @ 8
%timeit out=pool.map(RunVMC,xrange(1000000))
1 loops, best of 3: 568 ms per loop

- Daniel

是的，这绝对应该并行运行。请注意，这种方法会为每个工作进程创建（相当大的）df数组的副本。使用Cython，杀死需要GIL的RNG代码的小部分，正如我在我的答案评论中提到的那样，然后使用prange更具有内存效率。（另外，值得注意的是，您的RunVMC函数似乎没有比仅调用MonteCarlo_cy更好的目的....） - Danica

我更新了函数，使用df、head和span作为全局变量，使其达到应有的效果。在使用map传递额外参数时，我所见过的最好方法是使用itertools，但这似乎总是比它值得的麻烦多了。 - Daniel

@Ophion 这是我无法解决的唯一部分。在我的家用机器（运行OS X）上，多进程代码可以正常工作，并且使速度提高了100％。在我的工作机器（运行Win 7）上，当我输入pool = Pool（processes = 4）语句时，在内核窗口中出现“Import Error：No module named montecarlo_v5b”错误（而不是在ipython GUI中），这个错误会无限期地继续下去。这是Windows的怪癖吗？montecarlo_v5b是我的py文件的名称。 - marillion

@marillion 对不起，我没有Windows的经验，我知道与Linux相比有一些小问题-通过谷歌搜索可能会得到答案。 - Daniel

据我所知，我必须在代码中的某个地方使用 if __name__ == '__main__': 语句，但不确定具体在哪里。嗯，我已经花了太多时间来优化这个函数，还有很多工作要做，所以最好在另一个项目中学习如何在 Windows 中使用 multiprocessing 模块 :) 但是正如我所说，它在 OS X（或任何支持分叉的操作系统）中运行得非常好！ - marillion

0

你可以尝试切换到另一种 Python 变体。Jython 比 Python 快一些，在某些情况下 PyPy 速度更快。两者都可以尝试一下。

- zmbq

我认为这两个都不支持NumPy。（虽然我不确定OP是否真的利用了NumPy。） - user2357112

PyPy支持Numpy的一个子集，被称为NumPyPy。 - zmbq

哦，他们开始做那个了吗？我还以为他们只是在筹集资金雇人呢。看起来它还不够成熟，我不想使用它，但一旦完成，在 PyPy 中支持 NumPy 将会很棒。 - user2357112

我本来打算尝试PyPy，但是我没有管理员权限在我使用的电脑上安装它。我只能使用标准的Python解释器 :) - marillion

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Danica · Accepted Answer

正如其他人指出的那样，这并不是矢量化的，因此您的速度真正取决于Python解释器的速度。Cython 可以帮助您在最小更改的情况下大大提高速度：

>>> %timeit MonteCarlo(df, 5, 1000)
10000 loops, best of 3: 48 us per loop

>>> %timeit MonteCarlo_cy(df, 5, 1000)
100000 loops, best of 3: 3.67 us per loop

MonteCarlo_cy指的是（在IPython笔记本中，在%load_ext cythonmagic之后）：

%%cython
import numpy as np
cimport numpy as np

def MonteCarlo_cy(double[:, ::1] df, double head, double span):
    # Pick initial truck
    cdef long n = df.shape[0]
    cdef long rnd_truck = np.random.randint(0, n)
    cdef double full_weight = df[rnd_truck, 1]
    cdef double full_length = df[rnd_truck, 2]

    # Loop using other random truck until the bridge is full
    while True:
        rnd_truck = np.random.randint(0, n)
        full_length += head + df[rnd_truck, 2]
        if full_length > span:
            break
        else:
            full_weight += df[rnd_truck, 1]

    # Return average weight per feet on the bridge
    return full_weight / span