使用Cython加速连通组件算法

6

首先,我正在使用Python[2.7.2]、Numpy[1.6.2rc1]、Cython[0.16]和GCC[MinGW]编译器,运行在Windows XP机器上。

我需要一个3D连接组件算法来处理存储在Numpy数组中的一些3D二进制数据(即1和0)。不幸的是,我找不到任何现有的代码,因此我改编了在这里找到的代码以与3D数组一起工作。一切都很好,但是处理大型数据集时需要速度。因此,我尝试了Cython。

到目前为止,Cython已经提高了速度: Cython: 0.339秒 Python: 0.635秒

使用cProfile,我在纯Python版本中耗时最长的行是:

new_region = min(filter(lambda i: i > 0, array_region[xMin:xMax,yMin:yMax,zMin:zMax].ravel()))

问题:如何正确地将以下行“cythonize”?

new_region = min(filter(lambda i: i > 0, array_region[xMin:xMax,yMin:yMax,zMin:zMax].ravel()))
for x,y,z in zip(ind[0],ind[1],ind[2]):

希望能得到您的帮助,同时也希望这份工作能够帮助到其他人。


纯Python版本 [*.py]:

import numpy as np

def find_regions_3D(Array):
    x_dim=np.size(Array,0)
    y_dim=np.size(Array,1)
    z_dim=np.size(Array,2)
    regions = {}
    array_region = np.zeros((x_dim,y_dim,z_dim),)
    equivalences = {}
    n_regions = 0
    #first pass. find regions.
    ind=np.where(Array==1)
    for x,y,z in zip(ind[0],ind[1],ind[2]):

        # get the region number from all surrounding cells including diagnols (27) or create new region                        
        xMin=max(x-1,0)
        xMax=min(x+1,x_dim-1)
        yMin=max(y-1,0)
        yMax=min(y+1,y_dim-1)
        zMin=max(z-1,0)
        zMax=min(z+1,z_dim-1)

        max_region=array_region[xMin:xMax+1,yMin:yMax+1,zMin:zMax+1].max()

        if max_region > 0:
            #a neighbour already has a region, new region is the smallest > 0
            new_region = min(filter(lambda i: i > 0, array_region[xMin:xMax+1,yMin:yMax+1,zMin:zMax+1].ravel()))
            #update equivalences
            if max_region > new_region:
                if max_region in equivalences:
                    equivalences[max_region].add(new_region)
                else:
                    equivalences[max_region] = set((new_region, ))
        else:
            n_regions += 1
            new_region = n_regions

        array_region[x,y,z] = new_region


    #Scan Array again, assigning all equivalent regions the same region value.
    for x,y,z in zip(ind[0],ind[1],ind[2]):
        r = array_region[x,y,z]
        while r in equivalences:
            r= min(equivalences[r])
        array_region[x,y,z]=r

    #return list(regions.itervalues())
    return array_region

纯Python加速:

#Original line:
new_region = min(filter(lambda i: i > 0, array_region[xMin:xMax+1,yMin:yMax+1,zMin:zMax+1].ravel()))

#ver A:
new_region = array_region[xMin:xMax+1,yMin:yMax+1,zMin:zMax+1]
min(new_region[new_region>0])

#ver B:
new_region = min( i for i in array_region[xMin:xMax,yMin:yMax,zMin:zMax].ravel() if i>0)

#ver C:
sub=array_region[xMin:xMax,yMin:yMax,zMin:zMax]
nlist=np.where(sub>0)
minList=[]
for x,y,z in zip(nlist[0],nlist[1],nlist[2]):
    minList.append(sub[x,y,z])
new_region=min(minList)

时间结果:
O: 0.0220445
A: 0.0002161
B: 0.0173195
C: 0.0002560


Cython版本[*.pyx]:

import numpy as np
cimport numpy as np

DTYPE = np.int
ctypedef np.int_t DTYPE_t

cdef inline int int_max(int a, int b): return a if a >= b else b
cdef inline int int_min(int a, int b): return a if a <= b else b

def find_regions_3D(np.ndarray Array not None):
    cdef int x_dim=np.size(Array,0)
    cdef int y_dim=np.size(Array,1)
    cdef int z_dim=np.size(Array,2)
    regions = {}
    cdef np.ndarray array_region = np.zeros((x_dim,y_dim,z_dim),dtype=DTYPE)
    equivalences = {}
    cdef int n_regions = 0
    #first pass. find regions.
    ind=np.where(Array==1)
    cdef int xMin, xMax, yMin, yMax, zMin, zMax, max_region, new_region, x, y, z
    for x,y,z in zip(ind[0],ind[1],ind[2]):

        # get the region number from all surrounding cells including diagnols (27) or create new region                        
        xMin=int_max(x-1,0)
        xMax=int_min(x+1,x_dim-1)+1
        yMin=int_max(y-1,0)
        yMax=int_min(y+1,y_dim-1)+1
        zMin=int_max(z-1,0)
        zMax=int_min(z+1,z_dim-1)+1

        max_region=array_region[xMin:xMax,yMin:yMax,zMin:zMax].max()

        if max_region > 0:
            #a neighbour already has a region, new region is the smallest > 0
            new_region = min(filter(lambda i: i > 0, array_region[xMin:xMax,yMin:yMax,zMin:zMax].ravel()))
            #update equivalences
            if max_region > new_region:
                if max_region in equivalences:
                    equivalences[max_region].add(new_region)
                else:
                    equivalences[max_region] = set((new_region, ))
        else:
            n_regions += 1
            new_region = n_regions

        array_region[x,y,z] = new_region


    #Scan Array again, assigning all equivalent regions the same region value.
    cdef int r
    for x,y,z in zip(ind[0],ind[1],ind[2]):
        r = array_region[x,y,z]
        while r in equivalences:
            r= min(equivalences[r])
        array_region[x,y,z]=r

    #return list(regions.itervalues())
    return array_region

Cython 加速:

使用:

cdef np.ndarray region = np.zeros((3,3,3),dtype=DTYPE)
...
        region=array_region[xMin:xMax,yMin:yMax,zMin:zMax]
        new_region=np.min(region[region>0])

时间:0.170,原始时间:0.339秒。


结果

经过考虑许多有用的评论和答案后,我的当前算法运行速度为:
Cython:0.0219
Python:0.4309

Cython相对于纯Python提供了20倍的速度提升。

当前的Cython代码:

import numpy as np
import cython
cimport numpy as np
cimport cython

from libcpp.map cimport map

DTYPE = np.int
ctypedef np.int_t DTYPE_t

cdef inline int int_max(int a, int b): return a if a >= b else b
cdef inline int int_min(int a, int b): return a if a <= b else b

@cython.boundscheck(False)
def find_regions_3D(np.ndarray[DTYPE_t,ndim=3] Array not None):
    cdef unsigned int x_dim=np.size(Array,0),y_dim=np.size(Array,1),z_dim=np.size(Array,2)
    regions = {}
    cdef np.ndarray[DTYPE_t,ndim=3] array_region = np.zeros((x_dim,y_dim,z_dim),dtype=DTYPE)
    cdef np.ndarray region = np.zeros((3,3,3),dtype=DTYPE)
    cdef map[int,int] equivalences
    cdef unsigned int n_regions = 0

    #first pass. find regions.
    ind=np.where(Array==1)
    cdef np.ndarray[DTYPE_t,ndim=1] ind_x = ind[0], ind_y = ind[1], ind_z = ind[2]
    cells=range(len(ind_x))
    cdef unsigned int xMin, xMax, yMin, yMax, zMin, zMax, max_region, new_region, x, y, z, i, xi, yi, zi, val
    for i in cells:

        x=ind_x[i]
        y=ind_y[i]
        z=ind_z[i]

        # get the region number from all surrounding cells including diagnols (27) or create new region                        
        xMin=int_max(x-1,0)
        xMax=int_min(x+1,x_dim-1)+1
        yMin=int_max(y-1,0)
        yMax=int_min(y+1,y_dim-1)+1
        zMin=int_max(z-1,0)
        zMax=int_min(z+1,z_dim-1)+1

        max_region = 0
        new_region = 2000000000 # huge number
        for xi in range(xMin, xMax):
            for yi in range(yMin, yMax):
                for zi in range(zMin, zMax):
                    val = array_region[xi,yi,zi]
                    if val > max_region: # val is the new maximum
                        max_region = val

                    if 0 < val < new_region: # val is the new minimum
                        new_region = val

        if max_region > 0:
           if max_region > new_region:
                if equivalences.count(max_region) == 0 or new_region < equivalences[max_region]:
                    equivalences[max_region] = new_region
        else:
           n_regions += 1
           new_region = n_regions

        array_region[x,y,z] = new_region


    #Scan Array again, assigning all equivalent regions the same region value.
    cdef int r
    for i in cells:
        x=ind_x[i]
        y=ind_y[i]
        z=ind_z[i]

        r = array_region[x,y,z]
        while equivalences.count(r) > 0:
            r= equivalences[r]
        array_region[x,y,z]=r

    return array_region

安装文件[setup.py]

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
import numpy

setup(
    cmdclass = {'build_ext': build_ext},
    ext_modules = [Extension("ConnectComp", ["ConnectedComponents.pyx"],
                             include_dirs =[numpy.get_include()],
                             language="c++",
                             )]
)

构建命令:

python setup.py build_ext --inplace

1
你是否考虑过使用networkx或者 graphtool 来完成这个任务?它们都提供了连通组件算法,并且经过了充分的测试保证其正确性。另外,networkx 的安装非常简单易用。 - Jeff Tratner
1
如果您使用以下代码,我期望Python版本会(稍微)快一点:new_region = min( i for i in array_region[xMin:xMax,yMin:yMax,zMin:zMax].ravel() if i>0) - mgilson
1
此外,如果可能的话,您应该尝试使用更高效的索引方法——通过输入数组来进行索引。 (http://docs.cython.org/src/tutorial/numpy.html#efficient-indexing) - gotgenes
1
我没有仔细阅读你的代码,所以我可能会错,但我认为scipy的ndimage.label可以实现你想要的功能(我没有针对你的代码进行测试,但它应该非常有效)。请注意,您必须显式导入它:from scipy import ndimage; ndimage.label(your_data, connectivity_struct);然后稍后您可以应用其他内置函数(如查找边界矩形、重心等)。 - eldad-a
显示剩余9条评论
3个回答

7

正如@gotgenes所指出的那样,你应该使用cython -a <file>,并尝试减少看到的黄色数量。黄色对应着生成的C代码越差。

我发现可以减少黄色数量的方法:

  1. This looks like a situation where there will never be any out of bounds array access, as long as the input Array has 3 dimensions, so one can turn off bounds checking:

    cimport cython
    
    @cython.boundscheck(False)
    def find_regions_3d(...):
    
  2. Give the compiler more information for efficient indexing, i.e. whenever you cdef an ndarray give as much information as you can:

     def find_regions_3D(np.ndarray[DTYPE_t,ndim=3] Array not None):
         [...]
         cdef np.ndarray[DTYPE_t,ndim=3] array_region = ...
         [etc.]
    
  3. Give the compiler more information about positive/negative-ness. I.e. if you know a certain variable is always going to be positive, cdef it as unsigned int rather than int, as this means that Cython can eliminate any negative-indexing checks.

  4. Unpack the ind tuple immediately, i.e.

    ind = np.where(Array==1)
    cdef np.ndarray[DTYPE_t,ndim=1] ind_x = ind[0], ind_y = ind[1], ind_z = ind[2]
    
  5. Avoid using the for x,y,z in zip(..[0],..[1],..[2]) construct. In both cases, replace it with

    cdef int i
    for i in range(len(ind_x)):
        x = ind_x[i]
        y = ind_y[i]
        z = ind_z[i]
    
  6. Avoid doing the fancy indexing/slicing. And especially avoid doing it twice! And avoid using filter! I.e. replace

    max_region=array_region[xMin:xMax,yMin:yMax,zMin:zMax].max()
    if max_region > 0:
        new_region = min(filter(lambda i: i > 0, array_region[xMin:xMax,yMin:yMax,zMin:zMax].ravel()))
        if max_region > new_region:
            if max_region in equivalences:
                equivalences[max_region].add(new_region)
            else:
                equivalences[max_region] = set((new_region, ))
    

    with the more verbose

    max_region = 0
    new_region = 2000000000 # "infinity"
    for xi in range(xMin, xMax):
        for yi in range(yMin, yMax):
            for zi in range(zMin, zMax):
                val = array_region[xi,yi,zi]
                if val > max_region: # val is the new maximum
                    max_region = val
    
                if 0 < val < new_region: # val is the new minimum
                    new_region = val
    
    if max_region > 0:
       if max_region > new_region:
           if max_region in equivalences:
               equivalences[max_region].add(new_region)
           else:
               equivalences[max_region] = set((new_region, ))
    else:
       n_regions += 1
       new_region = n_regions
    

    This doesn't look so nice, but the triple loop compiles down to about 10 or so lines of C, while the compiled version of the original is hundreds of lines long and has a lot of Python object manipulation.

    (Obviously you must cdef all the variables you use, especially xi, yi, zi and val in this code.)

  7. You don't need to store all the equivalences, since the only thing you do with the set is find the minimum element. So if you instead have equivalences mapping int to int, you can replace

    if max_region in equivalences:
        equivalences[max_region].add(new_region)
    else:
        equivalences[max_region] = set((new_region, ))
    
    [...]
    
    while r in equivalences:
        r = min(equivalences[r])
    

    with

    if max_region not in equivalences or new_region < equivalences[max_region]:
        equivalences[max_region] = new_region
    
    [...]
    
    while r in equivalences:
        r = equivalences[r]
    
  8. The last thing to do after all that would be to not use any Python objects at all, specifically, don't use a dictionary for equivalences. This is now easy, since it is mapping int to int, so one could use from libcpp.map cimport map and then cdef map[int,int] equivalences, and replace .. not in equivalences with equivalences.count(..) == 0 and .. in equivalences with equivalences.count(..) > 0. (Note that it will then require a C++ compiler.)


1
谢谢你们提出的所有建议!我非常感激。我会尝试把它们都融合进去。 - Onlyjus
@Onlyjus,如果你尝试了所有方法并且它能够正常工作,那么最好只接受答案 :) ... 很可能还有其他人会给出更好的答案! - huon
我仍在努力,但在遵循了大多数你的建议之后,我已经比纯Python快了20倍。 - Onlyjus
@Onlyjus,我猜它只是移除了一些“if”语句,因此速度差异是无法检测到的。 - huon

3
我相信 scipyndimage.label 可以实现你需要的功能(我没有测试过它是否与你的代码相符,但应该非常高效)。请注意,你必须显式导入它:
from scipy import ndimage 
ndimage.label(your_data, connectivity_struct)

然后您可以应用其他内置函数(例如查找边界矩形、质心等)


0

在进行Cython优化时,您需要确保在循环中使用的是大多数本地C数据类型,而不是Python对象,因为后者会增加开销。查找这些地方的最佳方法是查看生成的C代码,并查找被转换为许多Py*函数调用的行。通常可以通过使用cdef变量而不是python对象来优化这些位置。

例如,在您的代码中,我会怀疑使用zip的循环会产生许多Python对象,使用一个用于获取ind[0]等元素的int索引进行迭代将更快。但是,请查看生成的C代码,并查看似乎调用了过多Python函数的内容。


1
我建议您只使用cython-a <pyxfile>命令,并检查生成的HTML文件,以查看Cython首先认为使用了大量Python对象的位置,然后再查看C代码。也许这就是您想表达的意思吧? - gotgenes

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接