使用Pandas将数据框中的多列转换为两列

4

我正在使用Python Pandas环境 :D

目前,我有一个类似于以下结构的数据框:

 0   1   2   3   4   5   6   7   8   9   10   11   12   13   14   15   16   17   18
ex0 ex1 ex2 ex3 ex4 ex5 ex6 ex7 ex8 ex9 ex10 ex11 ex12 ex13 ex14 ex15 ex16 ex17 ex18 

我的目标是使数据框看起来像这样:

 0   1   2   3   4   5  6    7   8   category   amount   
ex0 ex1 ex2 ex3 ex4 ex5 ex6 ex7 ex8     9        ex9
ex0 ex1 ex2 ex3 ex4 ex5 ex6 ex7 ex8     10       ex10
ex0 ex1 ex2 ex3 ex4 ex5 ex6 ex7 ex8     11       ex11
ex0 ex1 ex2 ex3 ex4 ex5 ex6 ex7 ex8     12       ex12
ex0 ex1 ex2 ex3 ex4 ex5 ex6 ex7 ex8     13       ex13
ex0 ex1 ex2 ex3 ex4 ex5 ex6 ex7 ex8     14       ex14
ex0 ex1 ex2 ex3 ex4 ex5 ex6 ex7 ex8     15       ex15
ex0 ex1 ex2 ex3 ex4 ex5 ex6 ex7 ex8     16       ex16
ex0 ex1 ex2 ex3 ex4 ex5 ex6 ex7 ex8     17       ex17
ex0 ex1 ex2 ex3 ex4 ex5 ex6 ex7 ex8     18       ex18

基本上,我希望将最后9个列标题和值变成它们自己的行,并在保留前8列和行的同时放入2个新列。我知道这意味着数据会被复制。

我看到stackoverflow上的一些其他答案使用以下代码处理较小的数据框,但对我没有起作用:

df.melt(['Type', 'Class'], var_name='Date', value_name='Value')

(df.set_index(['Type', 'Class'])
   .stack()
   .rename_axis(['Type', 'Class', 'Date'])
   .reset_index(name='Value')
)

非常感谢您的帮助!

谢谢。

3个回答

4

使用 melt,你离成功不远了。

df.melt(id_vars=df.columns[:9], var_name='category', value_name='amount')

Out[469]:
     0    1    2    3    4    5    6    7    8 category amount
0  ex0  ex1  ex2  ex3  ex4  ex5  ex6  ex7  ex8        9    ex9
1  ex0  ex1  ex2  ex3  ex4  ex5  ex6  ex7  ex8       10   ex10
2  ex0  ex1  ex2  ex3  ex4  ex5  ex6  ex7  ex8       11   ex11
3  ex0  ex1  ex2  ex3  ex4  ex5  ex6  ex7  ex8       12   ex12
4  ex0  ex1  ex2  ex3  ex4  ex5  ex6  ex7  ex8       13   ex13
5  ex0  ex1  ex2  ex3  ex4  ex5  ex6  ex7  ex8       14   ex14
6  ex0  ex1  ex2  ex3  ex4  ex5  ex6  ex7  ex8       15   ex15
7  ex0  ex1  ex2  ex3  ex4  ex5  ex6  ex7  ex8       16   ex16
8  ex0  ex1  ex2  ex3  ex4  ex5  ex6  ex7  ex8       17   ex17
9  ex0  ex1  ex2  ex3  ex4  ex5  ex6  ex7  ex8       18   ex18

1
你好!我意识到我之前的代码是正确的,但是我的列名有误,因此我误解了问题。我感到很尴尬。不过还是谢谢你确认了我的想法! - V1cst3r
1
@V1cst3r:不用感到尴尬,你很受欢迎。所有程序员都遇到了同样的问题,包括我自己。干杯 :) - Andy L.

1

只需将其 融化

print (df.melt([i for i in df.columns if int(i)<9], var_name="category", value_name="amount"))

     0    1    2    3    4    5    6    7    8 category amount
0  ex0  ex1  ex2  ex3  ex4  ex5  ex6  ex7  ex8        9    ex9
1  ex0  ex1  ex2  ex3  ex4  ex5  ex6  ex7  ex8       10   ex10
2  ex0  ex1  ex2  ex3  ex4  ex5  ex6  ex7  ex8       11   ex11
3  ex0  ex1  ex2  ex3  ex4  ex5  ex6  ex7  ex8       12   ex12
4  ex0  ex1  ex2  ex3  ex4  ex5  ex6  ex7  ex8       13   ex13
5  ex0  ex1  ex2  ex3  ex4  ex5  ex6  ex7  ex8       14   ex14
6  ex0  ex1  ex2  ex3  ex4  ex5  ex6  ex7  ex8       15   ex15
7  ex0  ex1  ex2  ex3  ex4  ex5  ex6  ex7  ex8       16   ex16
8  ex0  ex1  ex2  ex3  ex4  ex5  ex6  ex7  ex8       17   ex17
9  ex0  ex1  ex2  ex3  ex4  ex5  ex6  ex7  ex8       18   ex18

谢谢您的回答!不过我会接受安迪的答案,因为它可以在没有for循环的情况下完成相同的任务。干杯! - V1cst3r

0

首先,我会创建第二个数据框(使用不同的名称),移除最后两列,以便使维度正确。您可以通过使用此命令循环执行来完成此操作,在声明了一个正确大小的空df(减去最后两列)之后:

dataFrame.set_value(index, col, value, takeable=False)

或者你可以只用想要放在前几列的数据创建列表,然后使用字典声明新的数据框架,再使用它。

然后我会运行这个命令将另外两列复制过来。

cats=[cat for cat in df1.columns][-10:] 
row1_section=df1.loc[0][-10:]
df2['category'] = [cat for cat in cats]
df2['amount']=[example for example in row1_section]

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接