使用 Pandas 的 groupby 方法，为每个值创建一个新列

Question

使用 Pandas 的 groupby 方法，为每个值创建一个新列

3

希望标题表述清楚，我只想补充一点，可以假定每个键具有相同数量的值。在线搜索标题产生了以下解决方案：Split pandas dataframe based on groupby，虽然它并没有解决我的问题。我来举个例子：输入:

pd.DataFrame(data={'a':['foo','foo','foo','bar','bar','bar'],'b':[1,2,3,4,5,6]})

输出：

pd.DataFrame(data={'a':['foo','bar'],'b':[1,4],'c':[2,5],'d':[3,6]})

直观地说，这将是一个没有聚合函数的groupby函数，或者是一个将键制作成列表的聚合函数。

显然，可以使用for循环等方法“手动”完成，但是对于大数据集使用for循环非常耗费计算资源。

- user9548409

2个回答

1

这里是另一种方法，如果列名很重要，可以使用 groupby.apply 和 string.ascii_lowercase：

from string import ascii_lowercase

df = pd.DataFrame(data={'a':['foo','foo','foo','bar','bar','bar'],'b':[1,2,3,4,5,6]})

# Groupby 'a'
g = df.groupby('a')['b'].apply(list)

# Construct new DataFrame from g
new_df = pd.DataFrame(g.values.tolist(), index=g.index).reset_index()

# Fix column names
new_df.columns = [x for x in ascii_lowercase[:new_df.shape[1]]]

print(new_df)

     a  b  c  d
0  bar  4  5  6
1  foo  1  2  3

- Chris Adams

谢谢，这个方法也有效。我先尝试了另一个评论中的方法，但在遇到排序问题（“ValueError：无法使用空键标记索引”）后，我也尝试了这个方法，但是它产生了相同的错误。希望能得到帮助解决这个问题。目前为止，谷歌搜索并没有帮助到我。 - user9548409

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jezrael · Accepted Answer

使用GroupBy.cumcount对于Series或列g，然后通过DataFrame.set_index+Series.unstack或DataFrame.pivot进行重塑，最后使用DataFrame.add_prefix和DataFrame.rename_axis以及DataFrame.reset_index进行数据清洗。将"Original Answer"翻译为"最初的回答"。

g = df1.groupby('a').cumcount()
df = (df1.set_index(['a', g])['b']
         .unstack()
         .add_prefix('new_')
         .reset_index()
         .rename_axis(None, axis=1))
print (df)
     a  new_0  new_1  new_2
0  bar      4      5      6
1  foo      1      2      3

或者：

df1['g'] = df1.groupby('a').cumcount()
df = df1.pivot('a','g','b').add_prefix('new_').reset_index().rename_axis(None, axis=1)
print (df)
     a  new_0  new_1  new_2
0  bar      4      5      6
1  foo      1      2      3