在numpy数组中插值NaN值

77

有没有一种快速的方法可以将numpy数组中所有的NaN值替换为(比如)线性插值后的值?

例如,

[1 1 1 nan nan 2 2 nan 0]

会被转换为

[1 1 1 1.3 1.6 2 2  1  0]

8
抱歉打扰老帖子了,但我认为这很值得。一个更简单的方法是使用pandas和numpy:pd.DataFrame([1, 3, 4, np.nan, 6]).interpolate().values.ravel().tolist() - Francisco Zamora-Martínez
7
我发现 pd.Series([1, 3, 4, np.nan, 6]).interpolate().values.tolist() 更加简洁。 - Alfe
截至pandas 1.2.4版本:pd.Series([1, 3, 4, np.nan, 6]).interpolate().tolist() 更短。 - Shadi
13个回答

1

正如早期评论所建议的那样,最好的方法是使用经过同行评审的实现。Pandas库有一个用于1d数据的插值方法,可以在SeriesDataFrame中插值np.nan值:

pandas.Series.interpolatepandas.DataFrame.interpolate

文档非常简洁,建议仔细阅读!我的实现:

import pandas as pd

magnitudes_series = pd.Series(magnitudes)    # Convert np.array to pd.Series
magnitudes_series.interpolate(
    # I used "akima" because the second derivative of my data has frequent drops to 0
    method=interpolation_method,

    # Interpolate from both sides of the sequence, up to you (made sense for my data)
    limit_direction="both",

    # Interpolate only np.nan sequences that have number sequences at the ends of the respective np.nan sequences
    limit_area="inside",

    inplace=True,
)

# I chose to remove np.nan at the tails of data sequence
magnitudes_series.dropna(inplace=True)

result_in_numpy_array = magnitudes_series.values

1

我认为导入scipy有些过头了。这里有一种简单的方法,使用numpy并保持与np.interp相同的约定。

   def interp_nans(x:[float],left=None, right=None, period=None)->[float]:
    """ 
      e.g. [1 1 1 nan nan 2 2 nan 0] -> [1 1 1 1.3 1.6 2 2  1  0]
    
    """
    xp = [i for i, yi in enumerate(x) if np.isfinite(yi)]
    fp = [yi for i, yi in enumerate(x) if np.isfinite(yi)]
    return list(np.interp(x=list(range(len(x))), xp=xp, fp=fp,left=left,right=right,period=period))

0

使用填充关键字进行插值和外推

以下解决方案通过np.interp对数组中的nan值进行插值,如果两侧都存在有限值。 np.pad使用像constantreflect这样的模式处理边界处的nan值

enter image description here

    import numpy as np
    import matplotlib.pyplot as plt
    
    
    def extrainterpolate_nans_1d(
            arr, kws_pad=({'mode': 'edge'}, {'mode': 'edge'})
            ):
        """Interpolates and extrapolates nan values.
    
        Interpolation is linear, compare np.interp(..).
        Extrapolation works with pad keywords, compare np.pad(..).
    
        Parameters
        ----------
        arr : np.ndarray, shape (N,)
            Array to replace nans in.
        kws_pad : dict or (dict, dict)
            kwargs for np.pad on left and right side
    
        Returns
        -------
        bool
            Description of return value
    
        See Also
        --------
        https://numpy.org/doc/stable/reference/generated/numpy.interp.html
        https://numpy.org/doc/stable/reference/generated/numpy.pad.html
        https://dev59.com/lXRC5IYBdhLWcg3wCMg6#43821453
        """
        assert arr.ndim == 1
        if isinstance(kws_pad, dict):
            kws_pad_left = kws_pad
            kws_pad_right = kws_pad
        else:
            assert len(kws_pad) == 2
            assert isinstance(kws_pad[0], dict)
            assert isinstance(kws_pad[1], dict)
            kws_pad_left = kws_pad[0]
            kws_pad_right = kws_pad[1]
    
        arr_ip = arr.copy()
    
        # interpolation
        inds = np.arange(len(arr_ip))
        nan_msk = np.isnan(arr_ip)
        arr_ip[nan_msk] = np.interp(inds[nan_msk], inds[~nan_msk], arr[~nan_msk])
    
        # detemine pad range
        i0 = next(
            (ids for ids, val in np.ndenumerate(arr) if not np.isnan(val)), 0)[0]
        i1 = next(
            (ids for ids, val in np.ndenumerate(arr[::-1]) if not np.isnan(val)), 0)[0]
        i1 = len(arr) - i1
        # print('pad in range [0:{:}] and [{:}:{:}]'.format(i0, i1, len(arr)))
    
        # pad
        arr_pad = np.pad(
            arr_ip[i0:], pad_width=[(i0, 0)], **kws_pad_left)
        arr_pad = np.pad(
            arr_pad[:i1], pad_width=[(0, len(arr) - i1)], **kws_pad_right)
    
        return arr_pad
    
    
    # setup data
    ys = np.arange(30, dtype=float)**2/20
    ys[:5] = np.nan
    ys[20:] = 20
    ys[28:] = np.nan
    ys[[7, 13, 14, 18, 22]] = np.nan
    
    
    ys_ie0 = extrainterpolate_nans_1d(ys)
    kws_pad_sym = {'mode': 'symmetric'}
    kws_pad_const7 = {'mode': 'constant', 'constant_values':7.}
    ys_ie1 = extrainterpolate_nans_1d(ys, kws_pad=(kws_pad_sym, kws_pad_const7))
    ys_ie2 = extrainterpolate_nans_1d(ys, kws_pad=(kws_pad_const7, kws_pad_sym))
    
    fig, ax = plt.subplots()
    
    
    ax.scatter(np.arange(len(ys)), ys, s=15**2, label='ys')
    ax.scatter(np.arange(len(ys)), ys_ie0, s=8**2, label='ys_ie0, left_pad edge, right_pad edge')
    ax.scatter(np.arange(len(ys)), ys_ie1, s=6**2, label='ys_ie1, left_pad symmetric, right_pad 7')
    ax.scatter(np.arange(len(ys)), ys_ie2, s=4**2, label='ys_ie2, left_pad 7, right_pad symmetric')
    ax.legend()

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接