将多级索引转换为行向多维NumPy数组。

Question

将多级索引转换为行向多维NumPy数组。

11

假设我有一个类似于MultiIndex文档中示例的MultiIndex DataFrame。

>>> df 
               0   1   2   3
first second                
bar   one      0   1   2   3
      two      4   5   6   7
baz   one      8   9  10  11
      two     12  13  14  15
foo   one     16  17  18  19
      two     20  21  22  23
qux   one     24  25  26  27
      two     28  29  30  31

我想从这个DataFrame生成一个NumPy数组，其数据结构为三维，例如

>>> desired_arr
array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]],

       [[16, 20],
        [17, 21],
        [18, 22],
        [19, 23]],

       [[24, 28],
        [25, 29],
        [26, 30],
        [27, 31]]])

我该怎么做呢？

希望这里的情况很清楚 - 实际上我正在通过第一级对DataFrame进行解堆叠，然后尝试将结果中每个顶层的列MultiIndex转换为自己的二维数组。

我可以通过以下方式完成其中一半

>>> df.unstack(1)
         0       1       2       3    
second one two one two one two one two
first                                 
bar      0   4   1   5   2   6   3   7
baz      8  12   9  13  10  14  11  15
foo     16  20  17  21  18  22  19  23
qux     24  28  25  29  26  30  27  31

但我现在正在苦恼如何将每一列转换成二维数组并将它们合并在一起，除了使用循环和列表明确地进行这样做外，我无法找到更好的方法。

我觉得应该有一种方法来预先指定所需 NumPy 数组的形状，用 np.nan 填充它，并使用特定的迭代顺序填充 DataFrame 的值，但是我还没有成功解决这个问题。

生成示例 DataFrame 的代码：

iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]
ind = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = pd.DataFrame(np.arange(8*4).reshape((8, 4)), index=ind)

- Eric Hansen

2个回答

1

为了完善@divakar的回答，针对多维泛化：

# sort values by index
A = df.sort_index()

# fill na  
for idx in A.index.names:  
  A = A.unstack(idx).fillna(0).stack(1)

# create a tuple with the rights dimensions
reshape_size = tuple([len(x) for x in A.index.levels])

# reshape
arr = np.reshape(A.values, reshape_size ).swapaxes(0,1)

- Vianney Morain

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Divakar · Accepted Answer

一些 reshape 和 swapaxes 技巧 -

df.values.reshape(4,2,-1).swapaxes(1,2)

通用于 -

m,n = len(df.index.levels[0]), len(df.index.levels[1])
arr = df.values.reshape(m,n,-1).swapaxes(1,2)

基本上是将第一个轴分为长度为4和2的两个部分，创建一个3D数组，然后交换最后两个轴，即将长度为2的轴推到最后（作为最后一个轴）。

示例输出-

In [35]: df.values.reshape(4,2,-1).swapaxes(1,2)
Out[35]: 
array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]],

       [[16, 20],
        [17, 21],
        [18, 22],
        [19, 23]],

       [[24, 28],
        [25, 29],
        [26, 30],
        [27, 31]]])