使用另一个NumPy数组更新数组

Question

使用另一个NumPy数组更新数组

9

看似简单的问题：我有一个包含两列的数组，第一列表示ID，第二列表示计数。我希望用另一个类似的数组更新它，使得

import numpy as np

a = np.array([[1, 2],
              [2, 2],
              [3, 1],
              [4, 5]])

b = np.array([[2, 2],
              [3, 1],
              [4, 0],
              [5, 3]])

a.update(b)  # ????
>>> np.array([[1, 2],
              [2, 4],
              [3, 2],
              [4, 5],
              [5, 3]])

有没有一种用索引/切片的方法来完成这个操作，而不是简单地迭代每一行？

- triphook

这些ID列已经排序了吗？ - Divakar

3个回答

3

>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> dif=np.setdiff1d(col,a[:,0])
>>> val=b[np.in1d(b[:,0],dif)]
>>> result=np.concatenate((a,val))
array([[1, 2],
       [2, 2],
       [3, 1],
       [4, 5],
       [5, 3]])

注意，如果你想让结果排序，可以使用np.lexsort：

result[np.lexsort((result[:,0],result[:,0]))]

说明：

首先，您可以使用以下命令查找唯一标识符：

>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> col
array([1, 2, 3, 4, 5])

然后找到a和所有id之间的差异：

>>> dif=np.setdiff1d(col,a[:,0])
>>> dif
array([5])

然后在 b 中查找具有 diff 中的 id 的项：

>>> val=b[np.in1d(b[:,0],dif)]
>>> val
array([[5, 3]])

最后将结果与列表 a 连接起来：

>>> np.concatenate((a,val))

考虑另一个与排序相关的示例：

>>> a = np.array([[1, 2],
...               [2, 2],
...               [3, 1],
...               [7, 5]])
>>> 
>>> b = np.array([[2, 2],
...               [3, 1],
...               [4, 0],
...               [5, 3]])
>>> 
>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> dif=np.setdiff1d(col,a[:,0])
>>> val=b[np.in1d(b[:,0],dif)]

>>> result=np.concatenate((a,val))
>>> result[np.lexsort((result[:,0],result[:,0]))]
array([[1, 2],
       [2, 2],
       [3, 1],
       [4, 0],
       [5, 3],
       [7, 5]])

- Mazdak

1

这是一个老问题，但是这里有一个使用pandas的解决方案（可以推广到除sum之外的其他聚合函数）。此外，排序将自动发生：

import pandas as pd
import numpy as np

a = np.array([[1, 2],
              [2, 2],
              [3, 1],
              [4, 5]])

b = np.array([[2, 2],
              [3, 1],
              [4, 0],
              [5, 3]])

print((pd.DataFrame(a[:, 1], index=a[:, 0])
        .add(pd.DataFrame(b[:, 1], index=b[:, 0]), fill_value=0)
        .astype(int))
        .reset_index()
        .to_numpy())

输出：

[[1 2]
 [2 4]
 [3 2]
 [4 5]
 [5 3]]

- Tranbi

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Divakar · Accepted Answer

通用情况

方法 #1：您可以使用np.add.at执行这样的ID-based加法操作，如下所示 -

# First column of output array as the union of first columns of a,b              
out_id = np.union1d(a[:,0],b[:,0])

# Initialize second column of output array
out_count = np.zeros_like(out_id)

# Find indices where the first columns of a,b are placed in out_id
_,a_idx = np.where(a[:,None,0]==out_id)
_,b_idx = np.where(b[:,None,0]==out_id)
    
# Place second column of a into out_id & add in second column of b
out_count[a_idx] = a[:,1]
np.add.at(out_count, b_idx,b[:,1])

# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))

要查找a_idx和b_idx，可能更快的替代方法是使用np.searchsorted，代码如下 -

a_idx = np.searchsorted(out_id, a[:,0], side='left')
b_idx = np.searchsorted(out_id, b[:,0], side='left')

样例输入输出：

In [538]: a
Out[538]: 
array([[1, 2],
       [4, 2],
       [3, 1],
       [5, 5]])

In [539]: b
Out[539]: 
array([[3, 7],
       [1, 1],
       [4, 0],
       [2, 3],
       [6, 2]])

In [540]: out
Out[540]: 
array([[1, 3],
       [2, 3],
       [3, 8],
       [4, 2],
       [5, 5],
       [6, 2]])

方法 #2：你可以使用np.bincount执行基于ID的累加操作-

# First column of output array as the union of first columns of a,b  
out_id = np.union1d(a[:,0],b[:,0])

# Get all IDs and counts in a single arrays
id_arr = np.concatenate((a[:,0],b[:,0]))
count_arr = np.concatenate((a[:,1],b[:,1]))

# Get binned summations
summed_vals = np.bincount(id_arr,count_arr)

# Get mask of valid bins
mask = np.in1d(np.arange(np.max(out_id)+1),out_id)

# Mask valid summed bins for final counts array output
out_count = summed_vals[mask]

# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))

具体情况

如果a和b中的ID列已经排序，那么就更容易了，因为我们可以使用带有np.in1d的掩码来索引到使用np.union创建的输出ID数组，如下所示 -

# First column of output array as the union of first columns of a,b  
out_id = np.union1d(a[:,0],b[:,0])

# Masks of first columns of a and b matches in the output ID array
mask1 = np.in1d(out_id,a[:,0])
mask2 = np.in1d(out_id,b[:,0])

# Initialize second column of output array
out_count = np.zeros_like(out_id)

# Place second column of a into out_id & add in second column of b
out_count[mask1] = a[:,1]
np.add.at(out_count, np.where(mask2)[0],b[:,1])

# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))

样例运行 -

In [552]: a
Out[552]: 
array([[1, 2],
       [2, 2],
       [3, 1],
       [4, 5],
       [8, 5]])

In [553]: b
Out[553]: 
array([[2, 2],
       [3, 1],
       [4, 0],
       [5, 3],
       [6, 2],
       [8, 2]])

In [554]: out
Out[554]: 
array([[1, 2],
       [2, 4],
       [3, 2],
       [4, 5],
       [5, 3],
       [6, 2],
       [8, 7]])