高效地重新排列2D NumPy数组

Question

高效地重新排列2D NumPy数组

4

假设我有一个二维NumPy数组：

x = np.random.rand(100, 100000)

我检索出按列排序的索引（即每列都独立排序，返回索引）：

idx = np.argsort(x, axis=0)

然后，对于每一列，我需要从索引为[10, 20, 30, 40, 50]的值开始，将前5行（该列的前五行）放在最前面，然后跟随其余已排序的值（不是索引！）。

一个天真的方法可能是：

indices = np.array([10, 20, 30, 40, 50])
out = np.empty(x.shape, dtype=int64)

for col in range(x.shape[1]):
    # For each column, fill the first few rows with `indices`
    out[:indices.shape[0], col] = x[indices, col]  # Note that we want the values, not the indices

    # Then fill the rest of the rows in this column with the remaining sorted values excluding `indices`
    n = indices.shape[0]
    for row in range(indices.shape[0], x.shape[0]):
        if idx[row, col] not in indices:
            out[n, col] = x[row, col]  # Again, note that we want the value, not the index
            n += 1

- slaw

4个回答

0

这应该可以帮助您入门，通过消除最内层循环和if条件。首先，您可以将x [:，col]作为输入参数x传递。

def custom_ordering(x, idx, indices):
    # First get only the desired indices at the top
    out = x[indices, :]

    # delete `indices` from `idx` so `idx` doesn't have the values in `indices`
    idx2 = np.delete(idx, indices)

    # select `idx2` rows and concatenate
    out = np.concatenate((out, x[idx2, :]), axis=0)

    return out

- varagrawal

0

我使用一个较小的数组和较少的索引来完成这个，以便我可以轻松地对结果进行检查，但它应该适用于你的情况。我认为这个解决方案相当有效，因为所有操作都在原地进行。

import numpy as np
x = np.random.randint(10, size=(12,3)) 
indices = np.array([5,7,9])

# Swap top 3 rows with the rows 5,7,9 and vice versa
x[:len(indices)], x[indices] = x[indices], x[:len(indices)].copy()
# Sort the wanted portion of array
x[len(indices):].sort(axis=0)

这里是输出结果：

>>> import numpy as np
>>> x = np.random.randint(10, size=(10,3))
>>> indices = np.array([5,7,9])
>>> x
array([[7, 1, 8],
       [7, 4, 6],
       [6, 5, 2],
       [6, 8, 4],
       [2, 0, 2],
       [3, 0, 4],  # 5th row
       [4, 7, 4],
       [3, 1, 1],  # 7th row
       [3, 5, 3],
       [0, 5, 9]]) # 9th row

>>> # We want top of array to be
>>> x[indices]
array([[3, 0, 4],
       [3, 1, 1],
       [0, 5, 9]])

>>> # Swap top 3 rows with the rows 5,7,9 and vice versa
>>> x[:len(indices)], x[indices] = x[indices], x[:len(indices)].copy()
>>> # Assert that rows have been swapped correctly
>>> x
array([[3, 0, 4],  #
       [3, 1, 1],  # Top of array looks like above
       [0, 5, 9],  #
       [6, 8, 4],
       [2, 0, 2],
       [7, 1, 8],  # Previous top row
       [4, 7, 4],
       [7, 4, 6],  # Previous second row
       [3, 5, 3],
       [6, 5, 2]]) # Previous third row

>>> # Sort the wanted portion of array
>>> x[len(indices):].sort(axis=0)
>>> x
array([[3, 0, 4], #
       [3, 1, 1], # Top is the same, below is sorted
       [0, 5, 9], #
       [2, 0, 2],
       [3, 1, 2],
       [4, 4, 3],
       [6, 5, 4],
       [6, 5, 4],
       [7, 7, 6],
       [7, 8, 8]])

编辑：这个版本应该处理如果indices中的任何元素小于len(indices)

import numpy as np
x = np.random.randint(10, size=(12,3)) 
indices = np.array([1,2,4])

tmp = x[indices]

# Here I just assume that there aren't any values less or equal to -1. If you use 
# float, you can use -np.inf, but there is no such equivalent for ints (which I 
# use in my example).
x[indices] = -1

# The -1 will create dummy rows that will get sorted to be on top of the array,
# which can switch with tmp later
x.sort(axis=0) 
x[indices] = tmp

- Naphat Amundsen

1

哦，有趣！我真的很喜欢这可以在原地完成而且不需要额外步骤，也不需要进行任何脑力运动。在排序之前进行简单的交换非常优雅。 - slaw

小细节：如果您需要保证稳定排序，这种方法将不起作用。 - Paul Panzer

@PaulPanzer 当你说“稳定排序”时，你是指存在平局的情况吗？我认为这只在argsort的情况下才有影响？ - slaw

哎呀，不幸的是，在交换行中我遇到了竞争条件或FIFO情况。我认为最好通过一个中间数组进行交换会更安全。 - slaw

你可以在上面的例子中使用高级索引，例如 x[0], x[1] = x[[1]], x[[0]]，这本质上是 x[0], x[1] = x[1].copy(), x[0].copy()。话虽如此，我认为如果我的交换行像这样更安全：x[:len(indices)], x[indices] = x[indices], x[:len(indices)].copy()。x[indices] 使用了高级索引，因此隐式复制，但 x[:len(indices)] 实际上是基本索引，不会复制。我会在我的答案中进行编辑。 - Naphat Amundsen

显示剩余9条评论

0

这是我的问题解决方案：

rem_indices = [_ for _ in range(x.shape[0]) if _ not in indices]    # get all remaining indices
xs = np.take_along_axis(x, idx, axis = 0)                                        # the sorted array
out = np.empty(x.shape)

out[:indices.size, :] = xs[indices, :]                                                  # insert specific values at the beginning
out[indices.size:, :] = xs[rem_indices, :]                                         # insert the remaining values after the previous

请告诉我是否正确理解了您的问题。

- amzon-ex

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Divakar · Accepted Answer

方法一

这是一个基于上一篇文章的方法，不需要idx -

xc = x.copy()
xc[indices] = (xc.min()-np.arange(len(indices),0,-1))[:,None]
out = np.take_along_axis(x,xc.argsort(0),axis=0)

方法二

另一个使用np.isin掩码和idx的方法 -

mask = np.isin(idx, indices)
p2 = np.take_along_axis(x,idx.T[~mask.T].reshape(x.shape[1],-1).T,axis=0)
out = np.vstack((x[indices],p2))

方法二-备选方案 如果您不断地编辑out以更改除那些indices之外的一切，那么数组赋值可能适合您-

n = len(indices)
out[:n] = x[indices]

mask = np.isin(idx, indices)
lower = np.take_along_axis(x,idx.T[~mask.T].reshape(x.shape[1],-1).T,axis=0)
out[n:] = lower