如何逐行分析Cython函数

Question

如何逐行分析Cython函数

pythonprofilingcython

38

我经常很难找到cython代码中的瓶颈。如何逐行对cython函数进行性能分析？

- Till Hoffmann

Cython调试器允许您暂停吗？然后您可以执行此操作。 - Mike Dunlavey

3个回答

10

虽然 @Till's answer 展示了使用 setup.py 方法对 Cython 代码进行性能分析的方法，但本答案则介绍如何在 IPython/Jupiter notebook 中进行临时性能分析，更或者说是将 Cython 文档翻译到 IPython/Jupiter 上。

%prun 魔法命令：

如果想要使用 %prun 魔法命令，则只需将 Cython 的编译指令 profile 设置为 True 即可（这里使用的是来自 Cython 文档的例子）：

%%cython
# cython: profile=True

def recip_square(i):
    return 1. / i ** 3

def approx_pi(n=10000000):
    val = 0.
    for k in range(1, n + 1):
        val += recip_square(k)
    return (6 * val) ** .5

使用全局指令（即# cython: profile=True）是比修改全局Cython状态更好的方式，因为修改会导致扩展重新编译（如果修改全局Cython状态，则不会发生这种情况 - 旧的缓存版本编译时使用旧的全局状态将被重新加载/重用）。现在。

%prun -s cumulative approx_pi(1000000)

产生：

        1000005 function calls in 1.860 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.860    1.860 {built-in method builtins.exec}
        1    0.000    0.000    1.860    1.860 <string>:1(<module>)
        1    0.000    0.000    1.860    1.860 {_cython_magic_404d18ea6452e5ffa4c993f6a6e15b22.approx_pi}
        1    0.612    0.612    1.860    1.860 _cython_magic_404d18ea6452e5ffa4c993f6a6e15b22.pyx:7(approx_pi)
  1000000    1.248    0.000    1.248    0.000 _cython_magic_404d18ea6452e5ffa4c993f6a6e15b22.pyx:4(recip_square)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

%lprun-magic

如果要使用行级分析器（即%lprun-magic），则需要使用不同的编译指令来编译Cython模块：

%%cython
# cython: linetrace=True
# cython: binding=True
# distutils: define_macros=CYTHON_TRACE_NOGIL=1
...

linetrace=True触发在生成的C代码中创建跟踪，并暗示profile=True，因此不必再额外设置。如果没有binding=True，则line_profiler没有必要的代码信息，需要CYTHON_TRACE_NOGIL=1，因此当使用C编译器（而不是C预处理器）编译时，也会激活行分析。还可以使用CYTHON_TRACE=1，如果不希望以每行为基础对nogil块进行分析。

现在可以按以下方式使用它，通过-f选项传递应该进行行分析的函数（使用%lprun?获取有关可能选项的信息）：

%load_ext line_profiler
%lprun -f approx_pi -f recip_square approx_pi(1000000)

得出的结果为：

Timer unit: 1e-06 s

Total time: 1.9098 s
File: /XXXX.pyx
Function: recip_square at line 5

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     5                                           def recip_square(i):
     6   1000000    1909802.0      1.9    100.0      return 1. / i ** 2

Total time: 6.54676 s
File: /XXXX.pyx
Function: approx_pi at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           def approx_pi(n=10000000):
     9         1          3.0      3.0      0.0      val = 0.
    10   1000001    1155778.0      1.2     17.7      for k in range(1, n + 1):
    11   1000000    5390972.0      5.4     82.3          val += recip_square(k)
    12         1          9.0      9.0      0.0      return (6 * val) ** .5

line_profiler有一个小缺陷，对于cpdef函数无法正确检测函数主体。在这篇SO帖子中，展示了一种可能的解决方法。

需要注意的是，性能分析（包括行性能分析以上的）会改变执行时间和其分布，与“正常”运行不同。在这里我们看到，相同的函数在不同类型的性能分析下需要不同的时间。

Method (N=10^6):        Running Time:       Build with:
%timeit                 1 second
%prun                   2 seconds           profile=True
%lprun                  6.5 seconds         linetrace=True,binding=True,CYTHON_TRACE_NOGIL=1

- ead

7

虽然我不会称之为分析，但是可以通过使用cython并带有-a(注解)的选项来分析您的Cython代码，这将创建一个网页，其中突出显示了主要的瓶颈。例如，当我忘记声明一些变量时:

在正确声明它们(cdef double dudz, dvdz)之后:

- Bart

9

不对变量进行类型注释会降低代码速度。但是，使用参数-a不会提供有关实际运行时间的信息，只会告诉你是否正在进行python调用。 - Till Hoffmann

但在我的情况下，将Python转移到Cython代码时忘记声明变量之类的事情通常会使代码变慢，并且这是测试这些问题的一种快速简单的方法。这就是为什么我称其为“not really _profiling_”；它只是一个简单的第一次代码检查/分析。 - Bart

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Till Hoffmann · Accepted Answer

Robert Bradshaw 帮我使 Robert Kern 的 line_profiler 工具可以用于 cdef 函数，我想在 stackoverflow 上分享结果。

简而言之，设置一个常规的 .pyx 文件和构建脚本，在调用 cythonize 之前添加以下内容。

# Thanks to @tryptofame for proposing an updated snippet
from Cython.Compiler.Options import get_directive_defaults
directive_defaults = get_directive_defaults()

directive_defaults['linetrace'] = True
directive_defaults['binding'] = True

此外，您需要通过修改extensions设置来定义C宏CYTHON_TRACE=1。

extensions = [
    Extension("test", ["test.pyx"], define_macros=[('CYTHON_TRACE', '1')])
]

一个在iPython笔记本中使用%%cython魔术命令的工作示例在这里： http://nbviewer.ipython.org/gist/tillahoffmann/296501acea231cbdf5e7