Numpy查找另一个数组中元素的索引

Question

Numpy查找另一个数组中元素的索引

6

我有一个由唯一正整数组成的数组/集合，即

>>> unique = np.unique(np.random.choice(100, 4, replace=False))

还有一个包含多个从之前数组中抽样的元素的数组，例如

>>> A = np.random.choice(unique, 100)

我希望将数组A的值映射到它们在unique中出现的位置。

目前我找到的最佳解决方案是通过一个映射数组：

>>> table = np.zeros(unique.max()+1, unique.dtype)
>>> table[unique] = np.arange(unique.size)

上述代码为每个元素分配了数组中的索引，因此可以稍后通过高级索引将A映射：

>>> table[A]
array([2, 2, 3, 3, 3, 3, 1, 1, 1, 0, 2, 0, 1, 0, 2, 1, 0, 0, 2, 3, 0, 0, 0,
       0, 3, 3, 2, 1, 0, 0, 0, 2, 1, 0, 3, 0, 1, 3, 0, 1, 2, 3, 3, 3, 3, 1,
       3, 0, 1, 2, 0, 0, 2, 3, 1, 0, 3, 2, 3, 3, 3, 1, 1, 2, 0, 0, 2, 0, 2,
       3, 1, 1, 3, 3, 2, 1, 2, 0, 2, 1, 0, 1, 2, 0, 2, 0, 1, 3, 0, 2, 0, 1,
       3, 2, 2, 1, 3, 0, 3, 3], dtype=int32)

这已经给出了正确的解决方案。但是，如果unique中的唯一数字非常稀疏且较大，则此方法意味着创建一个非常大的table数组，仅用于存储一些数字以供稍后映射。

有更好的解决方案吗？

注意： A和unique都是示例数组，不是真正的数组。因此，问题不在于如何生成位置索引，而在于如何高效地将A的元素映射到unique中的索引。我想加速numpy中的伪代码如下：

B = np.zeros_like(A)
for i in range(A.size):
    B[i] = unique.index(A[i])

（假设上述伪代码中的unique是一个列表。）

- Imanol Luengo

3个回答

2

您可以使用标准的Python dict 与 np.vectorize。

inds = {e:i for i, e in enumerate(unique)}
B = np.vectorize(inds.get)(A)

- hilberts_drinking_problem

有趣的方法，不过我得测试一下 np.vectorize 在大矩阵上的性能。 - Imanol Luengo

np.vectorize 在 Python 级别上循环，因此无需执行该测试... 它只是语法糖。 - Eelco Hoogendoorn

2

numpy_indexed 包含一个向量化的 list.index 等价函数，它不需要与最大元素成比例的内存，而只需要与输入本身成比例的内存：

import numpy_indexed as npi
npi.indices(unique, A)

请注意，它也适用于任意的数据类型和维度。此外，被查询的数组不需要是唯一的；将返回遇到的第一个索引，与列表相同。

- Eelco Hoogendoorn

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Bi Rico · Accepted Answer

如果unique相当密集，那么你在问题中描述的表格方法是最好的选择，但unique.searchsorted(A)应该会产生相同的结果，而且不需要unique很密集。如果有人试图使用具有精度限制的浮点数来执行此类操作，则可以考虑像这样的方法。