如何将numpy数组展平但仍保留值位置的索引？

Question

如何将numpy数组展平但仍保留值位置的索引？

15

我有一些2D的numpy数组（矩阵），对于每一个数组，我想将它转换为一个包含数组值的向量和一个包含每行/列索引的向量。

例如，我可能有一个这样的数组：

x = np.array([[3, 1, 4],
              [1, 5, 9],
              [2, 6, 5]])

我基本上希望得到这些价值观。

[3, 1, 4, 1, 5, 9, 2, 6, 5]

以及他们的位置

[[0,0], [0,1], [0,2], [1,0], [1,1], [1,2], [2,0], [2,1], [2,2]]

我的最终目标是将它们作为列放入 pandas DataFrame 中，如下所示：

V | x | y
--+---+---
3 | 0 | 0
1 | 0 | 1
4 | 0 | 2
1 | 1 | 0
5 | 1 | 1
9 | 1 | 2
6 | 2 | 0
5 | 2 | 1
3 | 2 | 2

其中V是值，x是行位置（索引），y是列位置（索引）。

我认为我可以拼凑出一些东西，但我正在尝试找到更有效的方法，而不是摸索。例如，我知道可以使用类似于 x.reshape(x.size, 1) 的方法来获取值，并且可以尝试从 x.shape 创建索引列，但是似乎应该有更好的方法。

- Ellis Valentiner

我认为reshape在常数时间内执行，并且为了创建索引，您只需要一个单独的for循环。 - Saeid

你是说只有一个for循环吗？我知道reshape已经达到了最高效率。 - Ellis Valentiner

10个回答

7

你也可以让pandas为你完成这项工作，因为你将在数据框中使用它：

x = np.array([[3, 1, 4],
              [1, 5, 9],
              [2, 6, 5]])
df=pd.DataFrame(x)
#unstack the y columns so that they become an index then reset the
#index so that indexes become columns.
df=df.unstack().reset_index()
df

   level_0  level_1  0
0        0        0  3
1        0        1  1
2        0        2  2
3        1        0  1
4        1        1  5
5        1        2  6
6        2        0  4
7        2        1  9
8        2        2  5

#name the columns and switch the column order
df.columns=['x','y','V']
cols = df.columns.tolist()
cols = cols[-1:] + cols[:-1]
df = df[cols]
df

   V  x  y
0  3  0  0
1  1  0  1
2  2  0  2
3  1  1  0
4  5  1  1
5  6  1  2
6  4  2  0
7  9  2  1
8  5  2  2

- khammel

1

什么？你可以做到这个？！ - Ellis Valentiner

这真是纯粹的魔法！非常感谢您让我们看到了这一点！ - Arthur Khazbs

3

np.ndindex 类专为此而设，可以轻松完成此操作。与上面的 np.mesgrid 方法具有类似的效率，但需要更少的代码：

indices = np.array(list(np.ndindex(x.shape)))

对于数据框，进行以下操作：

df = pd.DataFrame({'V': x.flatten(), 'x': indices[:, 0], 'y': indices[:, 1]})

如果您不需要数据框架，只需执行list(np.ndindex(x.shape))。

注意：不要混淆x（手头的数组）和'x'（第二列的名称）之间的区别。

我知道这个问题很久以前就发布了，但如果对任何人有用，就像我没有看到提到np.ndindex一样。

- Miguel Capllonch

np.ndindex 的速度非常慢。比较 list(np.ndindex(shape)) 和 np.array(np.where(np.ones(shape))).T（相当简单的代码，对吧？），我得到了一个愚蠢的解决方案，对于 (4, 4, 4) 形状，快了约 5 倍，对于 (10, 10, 10) 形状，快了约 10 倍，而对于大型 ndarray，你可以猜测它会变得更快！ - Christian O'Reilly

1

另一种方式：

arr = np.array([[3, 1, 4],
                [1, 5, 9],
                [2, 6, 5]])

# build out rows array
x = np.arange(arr.shape[0]).reshape(arr.shape[0],1).repeat(arr.shape[1],axis=1)
# build out columns array
y = np.arange(arr.shape[1]).reshape(1,arr.shape[0]).repeat(arr.shape[0],axis=0)

# combine into table
table = np.vstack((arr.reshape(arr.size),x.reshape(arr.size),y.reshape(arr.size))).T
print(table)

- lemonhead

通过 x，y = np.mgrid [0：arr.shape [0]，0：arr.shape [0]]，您可以获得与 np.arange（... 相同的结果。对于使用vstack的创造性解决方案，我点赞。 - Mike de Klerk

1

更新于2020年11月（在pandas v1.1.3和numpy v1.19上测试）：

使用np.meshgrid和.reshape(-1)应该是一件轻而易举的事情。

x = np.array([[3, 1, 4],
              [1, 5, 9]])

x_coor, y_coor = np.meshgrid(range(x.shape[1]), range(x.shape[0]))    
df = pd.DataFrame({"V": x.reshape(-1), "x": x_coor.reshape(-1), "y": y_coor.reshape(-1)})

对于二维情况，您甚至不需要网格。只需在列轴上使用np.tile和在行轴上使用np.repeat即可。

df = pd.DataFrame({
    "V": x.reshape(-1),
    "x": np.tile(np.arange(x.shape[1]), x.shape[0]),
    "y": np.repeat(np.arange(x.shape[0]), x.shape[1])
})

为更好地反映轴的位置，示例数据被裁剪为shape=(2, 3)。

结果

- Bill Huang

0

你可以简单地使用循环。

x = np.array([[3, 1, 4],
              [1, 5, 9],
              [2, 6, 5]])
values = []
coordinates = []
data_frame = []
for v in xrange(len(x)):
    for h in xrange(len(x[v])):
        values.append(x[v][h])
        coordinates.append((h, v))
        data_frame.append(x[v][h], h, v)
        print '%s | %s | %s' % (x[v][h], v, h)

- Cyrbil

0

你可以尝试使用 itertools 来实现这个。

import itertools
import numpy as np
import pandas as pd

def convert2dataframe(array):
    a, b = array.shape
    x, y = zip(*list(itertools.product(range(a), range(b))))
    df = pd.DataFrame(data={'V':array.ravel(), 'x':x, 'y':y})
    return df

这适用于任何形状的数组，不一定是方阵。

- lakshayg

0

像 @miguel-capllonch 一样，我建议使用 np.ndindex，它允许您创建所需的输出，如下所示：

np.array([(v, *i) for (i, v) in zip(np.ndindex(x.shape), x.ravel())])

这将导致一个类似于以下的数组：

array([[ 3.  0.  0.]
       [ 1.  0.  1.]
       [ 4.  0.  2.]
       [ 1.  1.  0.]
       [ 5.  1.  1.]
       [ 9.  1.  2.]
       [ 2.  2.  0.]
       [ 6.  2.  1.]
       [ 5.  2.  2.]])

或者，仅使用numpy命令

np.hstack((list(np.ndindex(x.shape)), x.reshape((-1, 1))))

- Peter H.

0

这基本上是 x.ravel() 和笛卡尔索引之间的连接：

np.c_[x.ravel(), np.c_[np.repeat(np.r_[:3], 3), np.tile(np.r_[:3], 3)]]

输出：

array([[3, 0, 0],
       [1, 0, 1],
       [4, 0, 2],
       [1, 1, 0],
       [5, 1, 1],
       [9, 1, 2],
       [2, 2, 0],
       [6, 2, 1],
       [5, 2, 2]])

- Kevin

0

我重新提出这个问题，因为我认为我知道一个更容易理解的不同答案。这是我的做法：

xn = np.zeros((np.size(x), np.ndim(x)+1), dtype=np.float32)
row = 0
for ind, data in np.ndenumerate(x):
    xn[row, 0] = data
    xn[row, 1:] = np.asarray(ind)
    row += 1

在 xn 中我们有

[[ 3.  0.  0.]
 [ 1.  0.  1.]
 [ 4.  0.  2.]
 [ 1.  1.  0.]
 [ 5.  1.  1.]
 [ 9.  1.  2.]
 [ 2.  2.  0.]
 [ 6.  2.  1.]
 [ 5.  2.  2.]]

- yildirimyigit

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- rjonnal · Accepted Answer

我不知道是否最有效，但numpy.meshgrid是专为此设计的：

x = np.array([[3, 1, 4],
              [1, 5, 9],
              [2, 6, 5]])
XX,YY = np.meshgrid(np.arange(x.shape[1]),np.arange(x.shape[0]))
table = np.vstack((x.ravel(),XX.ravel(),YY.ravel())).T
print(table)

这将产生以下结果：

[[3 0 0]
 [1 1 0]
 [4 2 0]
 [1 0 1]
 [5 1 1]
 [9 2 1]
 [2 0 2]
 [6 1 2]
 [5 2 2]]

我认为 df = pandas.DataFrame(table) 将给您所需的数据框。