如何在numpy数组中找到到下一个非NaN值的距离

3
考虑以下数组:
arr = np.array(
    [
        [10, np.nan],
        [20, np.nan],
        [np.nan, 50],
        [15, 20],
        [np.nan, 30],
        [np.nan, np.nan],
        [10, np.nan],
        
    ]
)

对于arr中每一列中的每个单元格,我需要找到到下一个非NaN值的距离。 也就是说,预期的结果应该是这样的:
expected = np.array(
    [
        [1, 2],
        [2, 1],
        [1, 1],
        [3, 1],
        [2, np.nan],
        [1, np.nan],
        [np.nan, np.nan]
    ]
)
2个回答

1
使用,您可以使用maskshift计算一个反向的cumcount
out = (pd.DataFrame(arr).notna()[::-1]
         .apply(lambda s: s.groupby(s.cumsum()).cumcount().add(1)
                           .where(s.cummax()).shift()[::-1])
         .to_numpy()
      )

输出:

array([[ 1.,  2.],
       [ 2.,  1.],
       [ 1.,  1.],
       [ 3.,  1.],
       [ 2., nan],
       [ 1., nan],
       [nan, nan]])

0
你可以通过结合二分查找和一些numpy函数来提高性能速度。
box = []
for num in range(arr.shape[-1]):
    temp=arr[:, num]
    # this section gets the non-nan positions
    bools = ~np.isnan(temp)
    bools = bools.nonzero()[0]
    # this section gets positions of all indices 
    # with respect to the non-nan positions
    # note the use of side='right' to get the closest non-nan position
    positions = np.arange(temp.size)
    bool_positions = bools.searchsorted(positions, side='right')
    # out of bound positions are replaced with nan
    filtered=bool_positions!=bools.size
    blanks=np.empty(temp.size, dtype=float)
    blanks[~filtered]=np.nan
    trimmed=bool_positions[filtered]
    indexer = positions[filtered]
    # subtract position of next non-nan from actual position
    blanks[indexer] = bools[trimmed] - indexer
    box.append(blanks)

np.column_stack(box)
array([[ 1.,  2.],
       [ 2.,  1.],
       [ 1.,  1.],
       [ 3.,  1.],
       [ 2., nan],
       [ 1., nan],
       [nan, nan]])

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接