Pandas 合并同名列

Question

Pandas 合并同名列

3

我有以下数据框：

时间戳	参与者	等级	金币	参与者	等级	金币
1	1	100	6000	2	76	4200
2	1	150	5000	2	120	3700

我正在尝试更改数据框，使得所有具有相同名称的列移动到彼此下面，同时保留时间戳列：

时间戳	参与者	等级	金币
1	1	100	6000
2	1	150	5000
1	2	76	4200
2	2	120	3700

要清楚，上面的示例只是一个小样本，实际数据框具有许多相同命名的列和更多行。因此，解决方案需要考虑到这一点。

谢谢！

- Johannes_Sathre

2个回答

0

希望这能有所帮助

df1=pd.concat([df.iloc[:,0],df.loc[:,df.columns.duplicates()]],axis=1)
df2=df.loc[:,~df.columns.duplicates()]
df=pd.concat([df1,df2],axis=1)

- David

非常感谢！这可能有效，但挑战在于有多个列和行，因此解决方案需要更通用。 - Johannes_Sathre

改变了解决方案，变得更好。 - David

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jezrael · Accepted Answer

这个想法是通过 GroupBy.cumcount 对重复的列名进行去重计数，然后使用 DataFrame.stack 进行重塑：

df = df.set_index('Timestamp')
s = df.columns.to_series()

df.columns = [df.columns, s.groupby(s).cumcount()]

df = df.stack().reset_index(level=1, drop=True).reset_index()

如果列名不重复并添加了.和数字：

print (df)
   Timestamp  participant  level  gold  participant.1  level.1  gold.1
0          1            1    100  6000              2       76    4200
1          2            1    150  5000              2      120    3700

df = df.set_index('Timestamp')

df.columns = pd.MultiIndex.from_frame(df.columns.str.split('.', expand=True)
                                        .to_frame().fillna('0'))

df = df.stack().reset_index(level=1, drop=True).reset_index()
print (df)
0  Timestamp  gold  level  participant
0          1  6000    100            1
1          1  4200     76            2
2          2  5000    150            1
3          2  3700    120            2