高效地重新排列2D NumPy数组

4

假设我有一个二维NumPy数组:

x = np.random.rand(100, 100000)

我检索出按列排序的索引(即每列都独立排序,返回索引):

idx = np.argsort(x, axis=0) 

然后,对于每一列,我需要从索引为[10, 20, 30, 40, 50]的值开始,将前5行(该列的前五行)放在最前面,然后跟随其余已排序的值(不是索引!)。

一个天真的方法可能是:

indices = np.array([10, 20, 30, 40, 50])
out = np.empty(x.shape, dtype=int64)

for col in range(x.shape[1]):
    # For each column, fill the first few rows with `indices`
    out[:indices.shape[0], col] = x[indices, col]  # Note that we want the values, not the indices

    # Then fill the rest of the rows in this column with the remaining sorted values excluding `indices`
    n = indices.shape[0]
    for row in range(indices.shape[0], x.shape[0]):
        if idx[row, col] not in indices:
            out[n, col] = x[row, col]  # Again, note that we want the value, not the index
            n += 1
4个回答

1

方法一

这是一个基于上一篇文章的方法,不需要idx -

xc = x.copy()
xc[indices] = (xc.min()-np.arange(len(indices),0,-1))[:,None]
out = np.take_along_axis(x,xc.argsort(0),axis=0)

方法二

另一个使用np.isin掩码和idx的方法 -

mask = np.isin(idx, indices)
p2 = np.take_along_axis(x,idx.T[~mask.T].reshape(x.shape[1],-1).T,axis=0)
out = np.vstack((x[indices],p2))

方法二-备选方案 如果您不断地编辑out以更改除那些indices之外的一切,那么数组赋值可能适合您-

n = len(indices)
out[:n] = x[indices]

mask = np.isin(idx, indices)
lower = np.take_along_axis(x,idx.T[~mask.T].reshape(x.shape[1],-1).T,axis=0)
out[n:] = lower

假设我需要多次执行此操作(而不是一次),但在所有迭代中,out 的大小都相同。创建一个具有适当大小的 out 数组,然后将 x [indices]p2 复制到其中,是否会更“好”或更有效?这样,我就可以避免昂贵的内存创建了。 - slaw
@slaw 我觉得我明白你的意思了。看看 方法 #2-备选 是否适合你。我会相应地编辑 方法 #1 - Divakar
是的,我认为这会起作用并避免了 np.vstack!我想,从技术上讲,我们可以只做 out[n:] = np.take_along_axis(...) 并且可能在迭代中重复使用 mask。我将阅读关于 take_long_axis 的内容,以便理解其中的情况。 - slaw

0

这应该可以帮助您入门,通过消除最内层循环和if条件。首先,您可以将x [:,col]作为输入参数x传递。

def custom_ordering(x, idx, indices):
    # First get only the desired indices at the top
    out = x[indices, :]

    # delete `indices` from `idx` so `idx` doesn't have the values in `indices`
    idx2 = np.delete(idx, indices)

    # select `idx2` rows and concatenate
    out = np.concatenate((out, x[idx2, :]), axis=0)

    return out

0

我使用一个较小的数组和较少的索引来完成这个,以便我可以轻松地对结果进行检查,但它应该适用于你的情况。我认为这个解决方案相当有效,因为所有操作都在原地进行。

import numpy as np
x = np.random.randint(10, size=(12,3)) 
indices = np.array([5,7,9])

# Swap top 3 rows with the rows 5,7,9 and vice versa
x[:len(indices)], x[indices] = x[indices], x[:len(indices)].copy()
# Sort the wanted portion of array
x[len(indices):].sort(axis=0) 

这里是输出结果:

>>> import numpy as np
>>> x = np.random.randint(10, size=(10,3))
>>> indices = np.array([5,7,9])
>>> x
array([[7, 1, 8],
       [7, 4, 6],
       [6, 5, 2],
       [6, 8, 4],
       [2, 0, 2],
       [3, 0, 4],  # 5th row
       [4, 7, 4],
       [3, 1, 1],  # 7th row
       [3, 5, 3],
       [0, 5, 9]]) # 9th row

>>> # We want top of array to be
>>> x[indices]
array([[3, 0, 4],
       [3, 1, 1],
       [0, 5, 9]])

>>> # Swap top 3 rows with the rows 5,7,9 and vice versa
>>> x[:len(indices)], x[indices] = x[indices], x[:len(indices)].copy()
>>> # Assert that rows have been swapped correctly
>>> x
array([[3, 0, 4],  #
       [3, 1, 1],  # Top of array looks like above
       [0, 5, 9],  #
       [6, 8, 4],
       [2, 0, 2],
       [7, 1, 8],  # Previous top row
       [4, 7, 4],
       [7, 4, 6],  # Previous second row
       [3, 5, 3],
       [6, 5, 2]]) # Previous third row

>>> # Sort the wanted portion of array
>>> x[len(indices):].sort(axis=0)
>>> x
array([[3, 0, 4], #
       [3, 1, 1], # Top is the same, below is sorted
       [0, 5, 9], #
       [2, 0, 2],
       [3, 1, 2],
       [4, 4, 3],
       [6, 5, 4],
       [6, 5, 4],
       [7, 7, 6],
       [7, 8, 8]])

编辑: 这个版本应该处理如果indices中的任何元素小于len(indices)

import numpy as np
x = np.random.randint(10, size=(12,3)) 
indices = np.array([1,2,4])

tmp = x[indices]

# Here I just assume that there aren't any values less or equal to -1. If you use 
# float, you can use -np.inf, but there is no such equivalent for ints (which I 
# use in my example).
x[indices] = -1

# The -1 will create dummy rows that will get sorted to be on top of the array,
# which can switch with tmp later
x.sort(axis=0) 
x[indices] = tmp

1
哦,有趣!我真的很喜欢这可以在原地完成而且不需要额外步骤,也不需要进行任何脑力运动。在排序之前进行简单的交换非常优雅。 - slaw
小细节:如果您需要保证稳定排序,这种方法将不起作用。 - Paul Panzer
@PaulPanzer 当你说“稳定排序”时,你是指存在平局的情况吗?我认为这只在argsort的情况下才有影响? - slaw
哎呀,不幸的是,在交换行中我遇到了竞争条件或FIFO情况。我认为最好通过一个中间数组进行交换会更安全。 - slaw
你可以在上面的例子中使用高级索引,例如 x[0], x[1] = x[[1]], x[[0]],这本质上是 x[0], x[1] = x[1].copy(), x[0].copy()。话虽如此,我认为如果我的交换行像这样更安全:x[:len(indices)], x[indices] = x[indices], x[:len(indices)].copy()x[indices] 使用了高级索引,因此隐式复制,但 x[:len(indices)] 实际上是基本索引,不会复制。我会在我的答案中进行编辑。 - Naphat Amundsen
显示剩余9条评论

0
这是我的问题解决方案:
rem_indices = [_ for _ in range(x.shape[0]) if _ not in indices]    # get all remaining indices
xs = np.take_along_axis(x, idx, axis = 0)                                        # the sorted array
out = np.empty(x.shape)

out[:indices.size, :] = xs[indices, :]                                                  # insert specific values at the beginning
out[indices.size:, :] = xs[rem_indices, :]                                         # insert the remaining values after the previous

请告诉我是否正确理解了您的问题。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接