在numpy中加速矢量化眼动跟踪算法

Question

在numpy中加速矢量化眼动跟踪算法

algorithmperformanceopencvnumpyeye-tracking

4

我正在尝试使用 Fabian Timm 的眼球跟踪算法[http://www.inb.uni-luebeck.de/publikationen/pdfs/TiBa11b.pdf]（在此处找到：[http://thume.ca/projects/2012/11/04/simple-accurate-eye-center-tracking-in-opencv/]），并且我遇到了一些困难。我认为我的实现已经足够好地向量化了，但它仍然不够快以便于实时运行，并且它检测的瞳孔精度没有我希望的那么高。这是我第一次使用numpy，所以我不确定我做错了什么。

def find_pupil(eye):
    eye_len = np.arange(eye.shape[0])
    xx,yy = np.meshgrid(eye_len,eye_len) #coordinates
    XX,YY = np.meshgrid(xx.ravel(),yy.ravel()) #all distance vectors
    Dx,Dy = [YY-XX, YY-XX] #y2-y1, x2-x1 -- simpler this way because YY = XXT
    Dlen = np.sqrt(Dx**2+Dy**2)
    Dx,Dy = [Dx/Dlen, Dy/Dlen] #normalized

    Gx,Gy = np.gradient(eye)
    Gmagn = np.sqrt(Gx**2+Gy**2)

    Gx,Gy = [Gx/Gmagn,Gy/Gmagn] #normalized
    GX,GY = np.meshgrid(Gx.ravel(),Gy.ravel())

    X = (GX*Dx+GY*Dy)**2
    eye = cv2.bitwise_not(cv2.GaussianBlur(eye,(5,5),0.005*eye.shape[1])) #inverting and blurring eye for use as w
    eyem = np.repeat(eye.ravel()[np.newaxis,:],eye.size,0)
    C = (np.nansum(eyem*X, axis=0)/eye.size).reshape(eye.shape)

    return np.unravel_index(C.argmax(), C.shape)

以及其余的代码：

def find_eyes(face):
    left_x, left_y = [int(floor(0.5 * face.shape[0])), int(floor(0.2 * face.shape[1]))]
    right_x, right_y = [int(floor(0.1 * face.shape[0])), int(floor(0.2 * face.shape[1]))]
    area = int(floor(0.2 * face.shape[0]))
    left_eye = (left_x, left_y, area, area)
    right_eye = (right_x, right_y, area, area)

    return [left_eye,right_eye]



faceCascade = cv2.CascadeClassifier("haarcascade_frontalface_default.xml")
video_capture = cv2.VideoCapture(0)

while True:
    # Capture frame-by-frame
    ret, frame = video_capture.read()

    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    faces = faceCascade.detectMultiScale(
        gray,
        scaleFactor=1.1,
        minNeighbors=5,
        minSize=(30, 30),
        flags=cv2.CASCADE_SCALE_IMAGE
    )

    # Draw a rectangle around the faces
    for (x, y, w, h) in faces:
        cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
        roi_gray = gray[y:y+h, x:x+w]
        roi_color = frame[y:y+h, x:x+w]
        eyes = find_eyes(roi_gray)
        for (ex,ey,ew,eh) in eyes:
            eye_gray = roi_gray[ey:ey+eh,ex:ex+ew]
            eye_color = roi_color[ey:ey+eh,ex:ex+ew]
            cv2.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(255,0,0),2)
            px,py = find_pupil(eye_gray)
            cv2.rectangle(eye_color,(px,py),(px+1,py+1),(255,0,0),2)

    # Display the resulting frame
    cv2.imshow('Video', frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# When everything is done, release the capture
video_capture.release()
cv2.destroyAllWindows()

- Ben

1

第一步始终是对您的代码进行分析（例如使用line_profiler）。找出它花费大部分时间的行，然后专注于优化这些行。如果您将代码转换为MCVE，那么提供帮助将更加容易-如果我们没有运行代码所需的输入数据，我们无法真正评估准确性或性能。 - ali_m

我们也没有该函数的任何上下文信息。如果它被重复调用（例如在循环中），那么您可能会不必要地重新计算许多本地变量，这些变量在每次调用时都不会改变。但是，如果没有访问一些示例输入数据，我们无法轻松地确定哪些内容发生了更改，哪些没有发生更改。 - ali_m

谢谢，阿里。我会对我的代码进行分析。然而，让我的代码自包含而不复制其余部分对我来说很困难。我只能说face是通过网络摄像头捕获的人脸的正方形图像，x、y、w、h是一个眼睛周围正方形的左上角和尺寸。我可以链接到Github存储库吗？ - Ben

这并不是最理想的，但是提供一个指向 Github 存储库的链接总比没有好。像 Divakar 的回答中那样使用随机输入数据可以给出一些性能指标，但当然它并不能告诉我们关于准确性的任何信息。 - ali_m

我添加了我的其余代码。反正也不多。 - Ben

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Divakar · Accepted Answer

你可以执行许多保存复制元素的操作，然后通过创建允许NumPy广播的单例维度直接执行数学运算来执行一些数学运算。因此，有两个好处-即时操作可节省工作区内存和提高性能。此外，在最后，我们可以用简化版本替换nansum计算。因此，考虑到这一点，以下是一个修改后的方法-

def find_pupil_v2(face, x, y, w, h):    
    eye = face[x:x+w,y:y+h]
    eye_len = np.arange(eye.shape[0])

    N = eye_len.size**2
    eye_len_diff = eye_len[:,None] - eye_len
    Dlen = np.sqrt(2*((eye_len_diff)**2))
    Dxy0 = eye_len_diff/Dlen 

    Gx0,Gy0 = np.gradient(eye)
    Gmagn = np.sqrt(Gx0**2+Gy0**2)
    Gx,Gy = [Gx0/Gmagn,Gy0/Gmagn] #normalized

    B0 = Gy[:,:,None]*Dxy0[:,None,:]
    C0 = Gx[:,None,:]*Dxy0
    X = ((C0.transpose(1,0,2)[:,None,:,:]+B0[:,:,None,:]).reshape(N,N))**2

    eye1 = cv2.bitwise_not(cv2.GaussianBlur(eye,(5,5),0.005*eye.shape[1]))
    C = (np.nansum(X,0)*eye1.ravel()/eye1.size).reshape(eye1.shape)

    return np.unravel_index(C.argmax(), C.shape)

还有一个 "repeat" 步骤留在了 Dxy 中。可能可以避免这一步骤，直接将 Dxy0 输入到使用 Dxy 的步骤中以得到 X，但我还没有完全解决它。一切都已转换为基于 "broadcasting" 的方式！

运行时测试和输出验证 -

In [539]: # Inputs with random elements
     ...: face = np.random.randint(0,10,(256,256)).astype('uint8')
     ...: x = 40
     ...: y = 60
     ...: w = 64
     ...: h = 64
     ...: 

In [540]: find_pupil(face,x,y,w,h)
Out[540]: (32, 63)

In [541]: find_pupil_v2(face,x,y,w,h)
Out[541]: (32, 63)

In [542]: %timeit find_pupil(face,x,y,w,h)
1 loops, best of 3: 4.15 s per loop

In [543]: %timeit find_pupil_v2(face,x,y,w,h)
1 loops, best of 3: 529 ms per loop

看起来我们快要实现8倍的加速了！