将已知索引的字典转换为多维数组

Question

将已知索引的字典转换为多维数组

8

我有一个带有标签为 {(k,i): value, ...} 的字典。我现在想将这个字典转换成一个二维数组，其中在位置 [k,i] 处的数组元素的值是带有标签为 (k,i) 的字典中的值。行的长度可能不相同（例如，行 k = 4 可以达到索引 i = 60，而行 k = 24 可以达到索引 i = 31）。由于不对称性，可以将特定行中所有额外的条目设为0，以便得到矩形矩阵。

- Mathews24

示例输入/输出？ - Wayne Werner

2个回答

0

有一种基于键的字典稀疏格式可以从这样的字典构建。

从 Divakar 的示例字典 d 开始：

In [1189]: d={(1, 4): 120, (2, 2): 72, (2, 3): 100, (5, 2): 88}

创建一个正确形状和数据类型的空稀疏矩阵：

In [1190]: M=sparse.dok_matrix((6,5),dtype=int)
In [1191]: M
Out[1191]: 
<6x5 sparse matrix of type '<class 'numpy.int32'>'
    with 0 stored elements in Dictionary Of Keys format>

通过字典 update 添加 d 值。这个方法适用于这种特殊的稀疏格式是一个 dict 子类。但要注意，这个技巧并没有被记录在文档中（至少我不知道）：

In [1192]: M.update(d)
In [1193]: M
Out[1193]: 
<6x5 sparse matrix of type '<class 'numpy.int32'>'
    with 4 stored elements in Dictionary Of Keys format>
In [1194]: M.A    # convert M to numpy array (handy display trick)
Out[1194]: 
array([[  0,   0,   0,   0,   0],
       [  0,   0,   0,   0, 120],
       [  0,   0,  72, 100,   0],
       [  0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0],
       [  0,   0,  88,   0,   0]])

M 可以转换为其他稀疏格式，如 coo、csr。实际上，sparse 会根据使用情况（显示、计算等）自行进行此类转换。

In [1196]: print(M)
  (2, 3)    100
  (5, 2)    88
  (1, 4)    120
  (2, 2)    72

- hpaulj

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Divakar · Accepted Answer

这里有一种方法 -

# Get keys (as indices for output) and values as arrays
idx = np.array(d.keys())
vals = np.array(d.values())

# Get dimensions of output array based on max extents of indices
dims = idx.max(0)+1

# Setup output array and assign values into it indexed by those indices
out = np.zeros(dims,dtype=vals.dtype)
out[idx[:,0],idx[:,1]] = vals

我们还可以使用稀疏矩阵来获得最终输出。例如，使用坐标格式的稀疏矩阵。当保留为稀疏矩阵时，这将具有内存效率。因此，最后一步可以被替换为以下内容 -

from scipy.sparse import coo_matrix

out = coo_matrix((vals, (idx[:,0], idx[:,1])), dims).toarray()

示例运行 -

In [70]: d
Out[70]: {(1, 4): 120, (2, 2): 72, (2, 3): 100, (5, 2): 88}

In [71]: out
Out[71]: 
array([[  0,   0,   0,   0,   0],
       [  0,   0,   0,   0, 120],
       [  0,   0,  72, 100,   0],
       [  0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0],
       [  0,   0,  88,   0,   0]])

为了使其适用于任意维度的ndarray，我们可以使用线性索引并使用np.put将值分配到输出数组中。因此，在我们的第一种方法中，只需将分配值的最后一步替换为以下内容 -

np.put(out,np.ravel_multi_index(idx.T,dims),vals)

示例运行：

In [106]: d
Out[106]: {(1,0,0): 99, (1,0,4): 120, (2,0,2): 72, (2,1,3): 100, (3,0,2): 88}

In [107]: out
Out[107]: 
array([[[  0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0]],

       [[ 99,   0,   0,   0, 120],
        [  0,   0,   0,   0,   0]],

       [[  0,   0,  72,   0,   0],
        [  0,   0,   0, 100,   0]],

       [[  0,   0,  88,   0,   0],
        [  0,   0,   0,   0,   0]]])