将Pandas DataFrame中多列的值堆叠到单个列中

Question

将Pandas DataFrame中多列的值堆叠到单个列中

40

假设有以下DataFrame：

  key.0 key.1 key.2  topic
1   abc   def   ghi      8
2   xab   xcd   xef      9

如何将所有key.*列的值组合成单独的“key”列，并与对应于key.*列的主题值相关联？这是我想要的结果：

怎么才能把所有key.*列的值都合并到一个名为“key”的列中，并且让它和对应的主题值关联起来呢？我希望得到的结果是：

   topic  key
1      8  abc
2      8  def
3      8  ghi
4      9  xab
5      9  xcd
6      9  xef

注意，一些外部变量N可能导致key.N列的数量不固定。

- borice

3个回答

7

好的，因为当前答案中有一个被标记为重复问题的答案，所以我将在此回答。

使用wide_to_long函数。

pd.wide_to_long(df, ['key'], 'topic', 'age').reset_index().drop('age',1)
Out[123]: 
   topic  key
0      8  abc
1      9  xab
2      8  def
3      9  xcd
4      8  ghi
5      9  xef

- BENY

7

尝试了各种方法后，我发现以下内容基本上是直观的，前提是理解了stack的奥秘：

# keep topic as index, stack other columns 'against' it
stacked = df.set_index('topic').stack()
# set the name of the new series created
df = stacked.reset_index(name='key')
# drop the 'source' level (key.*)
df.drop('level_1', axis=1, inplace=True)

生成的数据框与要求的一致：

   topic  key
0      8  abc
1      8  def
2      8  ghi
3      9  xab
4      9  xcd
5      9  xef

为了全面了解过程，您可能希望打印中间结果。如果您不介意有比所需更多的列，关键步骤是 set_index('topic')、stack() 和 reset_index(name='key')。

- miraculixx

我似乎找不到有关reset_index函数中name参数的任何文档，您能解释一下它是如何工作的吗？ - ilyas patanam

这是Series.reset_index()。 - miraculixx

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Alexander · Accepted Answer

您可以融化您的数据框：

>>> keys = [c for c in df if c.startswith('key.')]
>>> pd.melt(df, id_vars='topic', value_vars=keys, value_name='key')

   topic variable  key
0      8    key.0  abc
1      9    key.0  xab
2      8    key.1  def
3      9    key.1  xcd
4      8    key.2  ghi
5      9    key.2  xef

它还给出了密钥的来源。

从v0.20开始，melt是pd.DataFrame类的一流函数：

>>> df.melt('topic', value_name='key').drop('variable', 1)

   topic  key
0      8  abc
1      9  xab
2      8  def
3      9  xcd
4      8  ghi
5      9  xef