如何按行数量将一列拆分为多列？

Question

如何按行数量将一列拆分为多列？

3

我有一个长的单列DataFrame，如下表：

列 A
单元格1
单元格2
单元格3
单元格4
单元格5
单元格6
单元格7
单元格8

我想将列A按指定行数拆分，并添加到其他新列中。如果我每列给出2行数量

列 A	列 B	列 C	列 D
单元格 1	单元格 3	单元格 5	单元格 7
单元格 2	单元格 4	单元格 6	单元格 8

按给定的行数量将长列拆分为新添加的列。

- Tom

2

展示你目前尝试过的 [mcve]。阅读[ask]。 - padaleiana

欢迎来到 Stack Overflow。请参加[tour]以了解Stack Overflow的工作原理，并阅读[ask]以了解如何提高问题的质量。然后[edit]您的问题以包含更多细节。也许看一下[mre]会有所帮助。此外，查看如何创建良好的可重复的pandas示例也会对您有所帮助。 - imxitiz

欢迎来到 Stack Overflow。请参阅[tour]以了解 Stack Overflow 的工作原理，并查看[ask]以了解如何提高您的问题质量。然后[edit]您的问题，以包含更多细节。也许看一下[mre]是个好主意。此外，阅读如何创建良好的可重复Pandas示例也会对您有所帮助。 - imxitiz

3个回答

1

你可以将行分成组，并将这些组透视为列。

简短版

number_rows = 2
df['cols'] = np.ceil(df['Column A'].expanding().count()/number_rows)
df.index = pd.Series(range(len(df))) % number_rows
df = df.pivot(columns='cols', values='Column A')

cols     1.0     2.0     3.0     4.0
0     Cell 0  Cell 2  Cell 4  Cell 6
1     Cell 1  Cell 3  Cell 5  Cell 7

如果您要拆分的行数（number_rows）不是您DataFrame长度的倍数，代码也将起作用，并且会添加缺失值（np.nan）。

详细说明步骤

创建演示数据

import pandas as pd
import numpy as np
df = pd.DataFrame({'Column A': [f'Cell {i}' for i in range(4)]})

  Column A
0   Cell 0
1   Cell 1
2   Cell 2
3   Cell 3

为后续的列和行创建列

number_rows = 3 # 3 instead of 2 to illustrate filling with missing values
df['cols'] = np.ceil(df['Column A'].expanding().count()/number_rows)
df['cols'] = 'Column '+df['cols'].astype(int).astype(str)
df['rows'] = pd.Series(range(len(df))) % number_rows

  Column A      cols  rows
0   Cell 0  Column 1     0
1   Cell 1  Column 1     1
2   Cell 2  Column 1     2
3   Cell 3  Column 2     0

数据透视表

df = df.pivot(columns='cols', index='rows', values='Column A')

cols Column 1 Column 2
rows                  
0      Cell 0   Cell 3
1      Cell 1      NaN
2      Cell 2      NaN

你可以使用以下代码删除列和索引名称:

df.columns.name = df.index.name = None

- Benjamin Ziepert

1

你应该创建一个新行，每个值对应的新列名，并且更改相应的索引。最后，你可以使用df.pivot()将你的数据框架转换为新格式。

n_rows = 2
df['new_col'] = "Column "
df['new_col']=df['new_col']+pd.Series(df.index%n_rows).ne(1).cumsum().astype(str)
df.index=df.index%n_rows
print(df.pivot(columns='new_col', values='Column A'))

new_col Column 1 Column 2 Column 3 Column 4
0         Cell 1   Cell 3   Cell 5   Cell 7
1         Cell 2   Cell 4   Cell 6   Cell 8

- ali bakhtiari

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- mozway · Accepted Answer

您可以使用底层的numpy数组以Fortran顺序（先行后列）reshape：

from string import ascii_uppercase

N = 2

out = (pd.DataFrame(df['Column A'].to_numpy().reshape(N, -1, order='F'))
        # the line below is optional, just to have the column names
         .rename(columns=dict(enumerate(ascii_uppercase))).add_prefix('Column ')
      )

输出：

  Column A Column B Column C Column D
0   Cell 1   Cell 3   Cell 5   Cell 7
1   Cell 2   Cell 4   Cell 6   Cell 8

如果你想处理那些非len(df)的倍数的N，你可以添加一个reindex步骤来用NaN填充DataFrame.

N = 3

out = (pd.DataFrame(df['Column A'].reindex(range(int(np.ceil(len(df)/N)*N)))
                                  .to_numpy().reshape(N, -1, order='F'))
         .rename(columns=dict(enumerate(ascii_uppercase))).add_prefix('Column ')
      )

输出：

  Column A Column B Column C
0   Cell 1   Cell 4   Cell 7
1   Cell 2   Cell 5   Cell 8
2   Cell 3   Cell 6      NaN