在numpy数组中找到最接近的值

Question

在numpy数组中找到最接近的值

485

如何在numpy数组中找到最接近的值？例如：

np.find_nearest(array, value)

- Fookatchu

20个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Soumen · Answer 1

所有的答案都有助于收集信息以编写高效的代码。然而，我已经编写了一个小的Python脚本，以优化各种情况。如果提供的数组是排序的，则为最佳情况。如果要搜索指定值的最近点的索引，则bisect模块最具时间效率。当搜索与数组对应的索引时，numpy searchsorted最有效。

import numpy as np
import bisect
xarr = np.random.rand(int(1e7))

srt_ind = xarr.argsort()
xar = xarr.copy()[srt_ind]
xlist = xar.tolist()
bisect.bisect_left(xlist, 0.3)

在[63]中：%time bisect.bisect_left(xlist, 0.3) CPU时间：用户0毫秒，系统0毫秒，总共0毫秒墙上时间：22.2微秒

np.searchsorted(xar, 0.3, side="left")

在[64]中：%time np.searchsorted(xar, 0.3, side="left")

CPU时间：用户0纳秒，系统0纳秒，总共0纳秒

Wall时间：98.9微秒

randpts = np.random.rand(1000)
np.searchsorted(xar, randpts, side="left")

%time np.searchsorted（xar，randpts，side =“ left”） CPU时间：用户4毫秒，系统：0毫秒，总计：4毫秒墙壁时间：1.2毫秒

如果我们按照乘法规则进行操作，则numpy应该需要 ~100毫秒，这意味着速度快了约83倍。

- Ishan Tomar · Answer 2

我认为最符合Python风格的方法是：

 num = 65 # Input number
 array = np.random.random((10))*100 # Given array 
 nearest_idx = np.where(abs(array-num)==abs(array-num).min())[0] # If you want the index of the element of array (array) nearest to the the given number (num)
 nearest_val = array[abs(array-num)==abs(array-num).min()] # If you directly want the element of array (array) nearest to the given number (num)

这是基本代码。如果您希望可以将其用作函数。

- Gusev Slava · Answer 3

也许对于ndarrays有所帮助：

def find_nearest(X, value):
    return X[np.unravel_index(np.argmin(np.abs(X - value)), X.shape)]

- Zhanwen Chen · Answer 4

这是unutbu的答案的矢量化版本：

def find_nearest(array, values):
    array = np.asarray(array)

    # the last dim must be 1 to broadcast in (array - values) below.
    values = np.expand_dims(values, axis=-1) 

    indices = np.abs(array - values).argmin(axis=-1)

    return array[indices]


image = plt.imread('example_3_band_image.jpg')

print(image.shape) # should be (nrows, ncols, 3)

quantiles = np.linspace(0, 255, num=2 ** 2, dtype=np.uint8)

quantiled_image = find_nearest(quantiles, image)

print(quantiled_image.shape) # should be (nrows, ncols, 3)

- boof · Answer 5

这里有一个适用于二维数组的版本，如果用户拥有scipy的cdist函数，则使用该函数，否则使用更简单的距离计算。默认情况下，输出最接近输入值的索引，但您可以使用“output”关键字将其更改为“index”，“value”或“both”之一，“value”输出“array[index]”，而“both”输出“index，array[index]”。对于非常大的数组，您可能需要使用“kind ='euclidean'”，因为默认的scipy cdist函数可能会耗尽内存。这也许不是绝对最快的解决方案，但它非常接近。

def find_nearest_2d(array, value, kind='cdist', output='index'):
    # 'array' must be a 2D array
    # 'value' must be a 1D array with 2 elements
    # 'kind' defines what method to use to calculate the distances. Can choose one
    #    of 'cdist' (default) or 'euclidean'. Choose 'euclidean' for very large
    #    arrays. Otherwise, cdist is much faster.
    # 'output' defines what the output should be. Can be 'index' (default) to return
    #    the index of the array that is closest to the value, 'value' to return the
    #    value that is closest, or 'both' to return index,value
    import numpy as np
    if kind == 'cdist':
        try: from scipy.spatial.distance import cdist
        except ImportError:
            print("Warning (find_nearest_2d): Could not import cdist. Reverting to simpler distance calculation")
            kind = 'euclidean'
    index = np.where(array == value)[0] # Make sure the value isn't in the array
    if index.size == 0:
        if kind == 'cdist': index = np.argmin(cdist([value],array)[0])
        elif kind == 'euclidean': index = np.argmin(np.sum((np.array(array)-np.array(value))**2.,axis=1))
        else: raise ValueError("Keyword 'kind' must be one of 'cdist' or 'euclidean'")
    if output == 'index': return index
    elif output == 'value': return array[index]
    elif output == 'both': return index,array[index]
    else: raise ValueError("Keyword 'output' must be one of 'index', 'value', or 'both'")

- Eduardo S. Pereira · Answer 6

对于二维数组，确定最近元素的i、j位置：

import numpy as np
def find_nearest(a, a0):
    idx = (np.abs(a - a0)).argmin()
    w = a.shape[1]
    i = idx // w
    j = idx - i * w
    return a[i,j], i, j

- Muhammad Yasirroni · Answer 7

对于那些寻找多个最近点的人，修改接受的答案：

import numpy as np
def find_nearest(array, value, k):
    array = np.asarray(array)
    idx = np.argsort(abs(array - value))[:k]
    return array[idx]

See: https://dev59.com/tGAf5IYBdhLWcg3w9Wiu#66937734

- kareem mohamed · Answer 8

import numpy as np
def find_nearest(array, value):
    array = np.array(array)
    z=np.abs(array-value)
    y= np.where(z == z.min())
    m=np.array(y)
    x=m[0,0]
    y=m[1,0]
    near_value=array[x,y]

    return near_value

array =np.array([[60,200,30],[3,30,50],[20,1,-50],[20,-500,11]])
print(array)
value = 0
print(find_nearest(array, value))

- denis · Answer 9

这个函数可以处理任意数量的查询，使用numpy.searchsorted，因此在对输入数组进行排序后，它的速度非常快。它也适用于二维、三维等常规网格：

#!/usr/bin/env python3
# keywords: nearest-neighbor regular-grid python numpy searchsorted Voronoi

import numpy as np

#...............................................................................
class Near_rgrid( object ):
    """ nearest neighbors on a Manhattan aka regular grid
    1d:
    near = Near_rgrid( x: sorted 1d array )
    nearix = near.query( q: 1d ) -> indices of the points x_i nearest each q_i
        x[nearix[0]] is the nearest to q[0]
        x[nearix[1]] is the nearest to q[1] ...
        nearpoints = x[nearix] is near q
    If A is an array of e.g. colors at x[0] x[1] ...,
    A[nearix] are the values near q[0] q[1] ...
    Query points < x[0] snap to x[0], similarly > x[-1].

    2d: on a Manhattan aka regular grid,
        streets running east-west at y_i, avenues north-south at x_j,
    near = Near_rgrid( y, x: sorted 1d arrays, e.g. latitide longitude )
    I, J = near.query( q: nq × 2 array, columns qy qx )
    -> nq × 2 indices of the gridpoints y_i x_j nearest each query point
        gridpoints = np.column_stack(( y[I], x[J] ))  # e.g. street corners
        diff = gridpoints - querypoints
        distances = norm( diff, axis=1, ord= )
    Values at an array A definded at the gridpoints y_i x_j nearest q: A[I,J]

    3d: Near_rgrid( z, y, x: 1d axis arrays ) .query( q: nq × 3 array )

    See Howitworks below, and the plot Voronoi-random-regular-grid.
    """

    def __init__( self, *axes: "1d arrays" ):
        axarrays = []
        for ax in axes:
            axarray = np.asarray( ax ).squeeze()
            assert axarray.ndim == 1, "each axis should be 1d, not %s " % (
                    str( axarray.shape ))
            axarrays += [axarray]
        self.midpoints = [_midpoints( ax ) for ax in axarrays]
        self.axes = axarrays
        self.ndim = len(axes)

    def query( self, queries: "nq × dim points" ) -> "nq × dim indices":
        """ -> the indices of the nearest points in the grid """
        queries = np.asarray( queries ).squeeze()  # or list x y z ?
        if self.ndim == 1:
            assert queries.ndim <= 1, queries.shape
            return np.searchsorted( self.midpoints[0], queries )  # scalar, 0d ?
        queries = np.atleast_2d( queries )
        assert queries.shape[1] == self.ndim, [
                queries.shape, self.ndim]
        return [np.searchsorted( mid, q )  # parallel: k axes, k processors
                for mid, q in zip( self.midpoints, queries.T )]

    def snaptogrid( self, queries: "nq × dim points" ):
        """ -> the nearest points in the grid, 2d [[y_j x_i] ...] """
        ix = self.query( queries )
        if self.ndim == 1:
            return self.axes[0][ix]
        else:
            axix = [ax[j] for ax, j in zip( self.axes, ix )]
            return np.array( axix )


def _midpoints( points: "array-like 1d, *must be sorted*" ) -> "1d":
    points = np.asarray( points ).squeeze()
    assert points.ndim == 1, points.shape
    diffs = np.diff( points )
    assert np.nanmin( diffs ) > 0, "the input array must be sorted, not %s " % (
            points.round( 2 ))
    return (points[:-1] + points[1:]) / 2  # floats

#...............................................................................
Howitworks = \
"""
How Near_rgrid works in 1d:
Consider the midpoints halfway between fenceposts | | |
The interval [left midpoint .. | .. right midpoint] is what's nearest each post --

    |   |       |                     |   points
    | . |   .   |          .          |   midpoints
      ^^^^^^               .            nearest points[1]
            ^^^^^^^^^^^^^^^             nearest points[2]  etc.

2d:
    I, J = Near_rgrid( y, x ).query( q )
    I = nearest in `x`
    J = nearest in `y` independently / in parallel.
    The points nearest [yi xj] in a regular grid (its Voronoi cell)
    form a rectangle [left mid x .. right mid x] × [left mid y .. right mid y]
    (in any norm ?)
    See the plot Voronoi-random-regular-grid.

Notes
-----
If a query point is exactly halfway between two data points,
e.g. on a grid of ints, the lines (x + 1/2) U (y + 1/2),
which "nearest" you get is implementation-dependent, unpredictable.

"""

Murky = \
""" NaNs in points, in queries ?
"""

__version__ = "2021-10-25 oct  denis-bz-py"

- MatteoLacki · Answer 10

我这里有一个针对已排序输入的版本，用于在 A 中找到最接近元素在 B 中的索引。

from cmath import inf

import numba
import numpy as np


@numba.njit
def get_indices_of_closest_questioned_points(
    interogators: npt.NDArray,
    questioned: npt.NDArray,
) -> npt.NDArray:
    """For each element in `interogators` get the index of the closest element in set `questioned`.
    """
    res = np.empty(shape=interogators.shape, dtype=np.uint32)
    N = len(interogators)
    M = len(questioned)
    n = m = 0
    closest_left_to_x = -inf
    while n < N and m < M:
        x = interogators[n]
        y = questioned[m]
        if y < x:
            closest_left_to_x = y
            m += 1
        else:
            res[n] = m - (x - closest_left_to_x < y - x)
            n += 1
    while n < N:
        res[n] = M - 1
        n += 1
    return res

排序是一个经过高度优化的操作，根据输入和使用的算法，运行时间为O(nlogn)或O(n)。

上述代码显然也是O(n)，numba使其运行速度更快，达到了numpy的速度。

以下是一个示例用法：

In [12]: get_indices_of_closest_questioned_points(np.array([0,5,10]), np.array([-1,2,6,8,9,10]))
Out[12]: array([0, 2, 5], dtype=uint32)

结果是0 2 5，因为-1最接近0，它是第二个数组的第0个元素，5最接近6，它是第二个数组的第2个元素，以此类推。

如果输入为[0]和[-1,1]，则会返回最接近元素中的第一个-1。

祝一切顺利！