使用Python进行多个一维插值

Question

使用Python进行多个一维插值

4

我正在模拟CCD阵列中的陷阱。目前，我正在使用NumPy和Scipy，并且已经能够向量化大部分调用，这使我获得了一些加速。

目前，我的代码瓶颈是在内部循环中必须从大量不同的插值中检索数字。这个特定步骤占据了计算时间的约97%。

这里有一个我问题的简单示例：

import numpy as np
from scipy.interpolate import interp1d

# the CCD array containing values from 0-100
array = np.random.random(200)*100

# a number of traps at different positions in the CCD array 
n_traps = 100
trap_positions = np.random.randint(0,200,n_traps)

# xvalues for the interpolations
xval = [0,10,100]
# each trap has y values corresponding to the x values 
trap_yvals = [np.random.random(3)*100 for _ in range(n_traps)]
# The xval-to-yval interpolation is made for each trap
yval_interps = [interp1d(xval,yval) for yval in trap_yvals]

# moving the trap positions down over the array
for i in range(len(array)):
    # calculating new trap position
    new_trap_pos = trap_positions+i
    # omitting traps that are outside array
    trap_inside_array = new_trap_pos < len(array)
    # finding the array_vals (corresponding to the xvalues in the interpolations)
    array_vals = array[new_trap_pos[trap_inside_array]]

    # retrieving the interpolated y-values (this is the bottleneck)
    yvals = np.array([yval_interps[trap_inside_array[t]](array_vals[t]) 
                       for t in range(len(array_vals))])

    # some more operations using yvals

有没有一种方法可以进行优化，比如使用Cython或类似的工具？

- Skottfelt

1

请使用InterpolatedUnivariateSpline而不是interp1d，以提高性能数倍。请参见此处的链接：https://dev59.com/ioDaa4cB1Zd3GeqP_xqC#34289911 - M.T

另一个重要的改进是，如果可能的话，将数组作为interp1d/InterpolatedUnivariateSpline的参数传递，而不是循环单个值。 - M.T

@M.T：感谢你提到使用InterpolatedUnivariateSpline来加速的提示。我之所以要循环遍历单个值，是因为每个值都需要从不同的插值中提取出来，而且我还没有找到其他的解决办法。 - Skottfelt

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Skottfelt · Accepted Answer

我想了一会儿，认为我找到了一个非常好的解决方案，并希望分享，尽管这意味着我将回答自己的问题。

首先，我意识到可以不使用scipy.interpolation函数之一，只需找到两个值之间的插值即可。这可以通过以下小函数完成。

from bisect import bisect_left

def two_value_interpolation(x,y,val):
    index = bisect_left(x,val)
    _xrange = x[index] - x[index-1]
    xdiff = val - x[index-1]
    modolo = xdiff/_xrange
    ydiff = y[index] - y[index-1]
    return y[index-1] + modolo*ydiff

这让我的速度有所提升，但我想看看是否还有更好的方法，因此我将函数移植到Cython中，并添加了循环以覆盖所有陷阱，这样我就不必在Python代码中执行该操作了：

# cython: boundscheck=False
# cython: wraparound=False
# cython: cdivision=True

import numpy as np
cimport numpy as np

def two_value_interpolation_c(np.ndarray[np.float64_t] x, 
                                 np.ndarray[np.float64_t, ndim=2] y,
                                 np.ndarray[np.float64_t] val_array):
    cdef unsigned int index, trap
    cdef unsigned int ntraps=val_array.size
    cdef long double val, _xrange, xdiff, modolo, ydiff
    cdef np.ndarray y_interp = np.zeros(ntraps, dtype=np.float64)

    for trap in range(ntraps):
        index = 0
        val = val_array[trap]
        while x[index] <= val:
            index += 1

        _xrange = x[index] - x[index-1]
        xdiff = val - x[index-1]
        modolo = xdiff/_xrange
        ydiff = y[trap,index] - y[trap,index-1]
        y_interp[trap] = y[trap,index-1] + modolo*ydiff
    return y_interp

我对不同的方法进行了一些计时（使用一些比原问题中提到的更大的数组和更多的陷阱）：

使用原始方法，即interp1d：（3次中取最佳）15.1秒

使用InterpolatedUnivariateSpline（k=1）代替由@M.T建议的interp1d：（3次中取最佳）7.25秒

使用two_value_interpolation函数：（3次中取最佳）1.34秒

使用Cython实现的two_value_interpolation_c：（3次中取最佳）0.113秒