有没有一种快速的方法可以将numpy数组中所有的NaN值替换为(比如)线性插值后的值?
例如,
[1 1 1 nan nan 2 2 nan 0]
会被转换为
[1 1 1 1.3 1.6 2 2 1 0]
有没有一种快速的方法可以将numpy数组中所有的NaN值替换为(比如)线性插值后的值?
例如,
[1 1 1 nan nan 2 2 nan 0]
会被转换为
[1 1 1 1.3 1.6 2 2 1 0]
正如早期评论所建议的那样,最好的方法是使用经过同行评审的实现。Pandas库有一个用于1d数据的插值方法,可以在Series
或DataFrame
中插值np.nan
值:
pandas.Series.interpolate 或 pandas.DataFrame.interpolate
文档非常简洁,建议仔细阅读!我的实现:
import pandas as pd
magnitudes_series = pd.Series(magnitudes) # Convert np.array to pd.Series
magnitudes_series.interpolate(
# I used "akima" because the second derivative of my data has frequent drops to 0
method=interpolation_method,
# Interpolate from both sides of the sequence, up to you (made sense for my data)
limit_direction="both",
# Interpolate only np.nan sequences that have number sequences at the ends of the respective np.nan sequences
limit_area="inside",
inplace=True,
)
# I chose to remove np.nan at the tails of data sequence
magnitudes_series.dropna(inplace=True)
result_in_numpy_array = magnitudes_series.values
我认为导入scipy有些过头了。这里有一种简单的方法,使用numpy并保持与np.interp相同的约定。
def interp_nans(x:[float],left=None, right=None, period=None)->[float]:
"""
e.g. [1 1 1 nan nan 2 2 nan 0] -> [1 1 1 1.3 1.6 2 2 1 0]
"""
xp = [i for i, yi in enumerate(x) if np.isfinite(yi)]
fp = [yi for i, yi in enumerate(x) if np.isfinite(yi)]
return list(np.interp(x=list(range(len(x))), xp=xp, fp=fp,left=left,right=right,period=period))
以下解决方案通过np.interp
对数组中的nan值进行插值,如果两侧都存在有限值。 np.pad
使用像constant
或reflect
这样的模式处理边界处的nan值。
import numpy as np
import matplotlib.pyplot as plt
def extrainterpolate_nans_1d(
arr, kws_pad=({'mode': 'edge'}, {'mode': 'edge'})
):
"""Interpolates and extrapolates nan values.
Interpolation is linear, compare np.interp(..).
Extrapolation works with pad keywords, compare np.pad(..).
Parameters
----------
arr : np.ndarray, shape (N,)
Array to replace nans in.
kws_pad : dict or (dict, dict)
kwargs for np.pad on left and right side
Returns
-------
bool
Description of return value
See Also
--------
https://numpy.org/doc/stable/reference/generated/numpy.interp.html
https://numpy.org/doc/stable/reference/generated/numpy.pad.html
https://dev59.com/lXRC5IYBdhLWcg3wCMg6#43821453
"""
assert arr.ndim == 1
if isinstance(kws_pad, dict):
kws_pad_left = kws_pad
kws_pad_right = kws_pad
else:
assert len(kws_pad) == 2
assert isinstance(kws_pad[0], dict)
assert isinstance(kws_pad[1], dict)
kws_pad_left = kws_pad[0]
kws_pad_right = kws_pad[1]
arr_ip = arr.copy()
# interpolation
inds = np.arange(len(arr_ip))
nan_msk = np.isnan(arr_ip)
arr_ip[nan_msk] = np.interp(inds[nan_msk], inds[~nan_msk], arr[~nan_msk])
# detemine pad range
i0 = next(
(ids for ids, val in np.ndenumerate(arr) if not np.isnan(val)), 0)[0]
i1 = next(
(ids for ids, val in np.ndenumerate(arr[::-1]) if not np.isnan(val)), 0)[0]
i1 = len(arr) - i1
# print('pad in range [0:{:}] and [{:}:{:}]'.format(i0, i1, len(arr)))
# pad
arr_pad = np.pad(
arr_ip[i0:], pad_width=[(i0, 0)], **kws_pad_left)
arr_pad = np.pad(
arr_pad[:i1], pad_width=[(0, len(arr) - i1)], **kws_pad_right)
return arr_pad
# setup data
ys = np.arange(30, dtype=float)**2/20
ys[:5] = np.nan
ys[20:] = 20
ys[28:] = np.nan
ys[[7, 13, 14, 18, 22]] = np.nan
ys_ie0 = extrainterpolate_nans_1d(ys)
kws_pad_sym = {'mode': 'symmetric'}
kws_pad_const7 = {'mode': 'constant', 'constant_values':7.}
ys_ie1 = extrainterpolate_nans_1d(ys, kws_pad=(kws_pad_sym, kws_pad_const7))
ys_ie2 = extrainterpolate_nans_1d(ys, kws_pad=(kws_pad_const7, kws_pad_sym))
fig, ax = plt.subplots()
ax.scatter(np.arange(len(ys)), ys, s=15**2, label='ys')
ax.scatter(np.arange(len(ys)), ys_ie0, s=8**2, label='ys_ie0, left_pad edge, right_pad edge')
ax.scatter(np.arange(len(ys)), ys_ie1, s=6**2, label='ys_ie1, left_pad symmetric, right_pad 7')
ax.scatter(np.arange(len(ys)), ys_ie2, s=4**2, label='ys_ie2, left_pad 7, right_pad symmetric')
ax.legend()
pd.DataFrame([1, 3, 4, np.nan, 6]).interpolate().values.ravel().tolist()
- Francisco Zamora-Martínezpd.Series([1, 3, 4, np.nan, 6]).interpolate().values.tolist()
更加简洁。 - Alfepd.Series([1, 3, 4, np.nan, 6]).interpolate().tolist()
更短。 - Shadi