在pandas中根据行值创建新列

Question

在pandas中根据行值创建新列

3

我有一个类似于这样的Pandas数据框：

     id             name  total  cubierto  no_cubierto  escuela_id  nivel_id 
0   1        direccion      1         1            0   420000707         1   
1   2  frente_a_alunos      4         4            0   420000707         1   
2   3            apoyo      2         2            0   420000707         1   
3   4        direccion      2         2            0   840477414         2   
4   5  frente_a_alunos      8         8            0   840477414         2   
5   6            apoyo      4         3            1   840477414         2   
6   7        direccion      7         7            0   918751515         3   
7   8            apoyo     37        37            0   918751515         3   
8   9        direccion      1         1            0   993683216         1   
9  10  frente_a_alunos      7         7            0   993683216         1

“name”列有3个唯一值：

 - direccion
 - frente a alunos
 - apoyo

我需要获取一个新的数据框，按"escuela_id"和"nivel_id"进行分组，并具有以下列：

 - direccion_total
 - direccion_cubierto
 - frente_a_alunos_total
 - frente_a_alunos_cubierto
 - apoyo_total
 - apoyo_cubierto
 - escuela_id
 - nivel_id

从列"total"和"cubierto"中获取值。我不需要列"no_cubierto"。是否可以使用pandas函数完成？我卡在这里，找不到任何解决方案。

示例的输出应如下所示：

escuela_id      nivel_id   apoyo_cubierto   apoyo_total   direccion_total  
0   420000707         1              2           2                1   
1   840477414         2              3           4                2   
2   918751515         3             37          37                7   
3   993683216         1             ..          ..                1   


   direccion_cubierto    frente_a_alunos_total    frente_a_alunos_cubierto  
0                   1                     4                        4  
1                   2                     8                        8  
2                   7                    ..                       ..  
3                   1                     7                        7

- fb__

Pandas提供了groupby()函数来实现此功能。 - leopardxpreload

展示你的代码，我们可以为你提供一些反馈和指导。 - Umar.H

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- NYC Coder · Accepted Answer

这里需要使用 pivot_table：

df = df.pivot_table(index=['escuela_id', 'nivel_id'], columns='name', values=['total', 'cubierto']).reset_index()
df.columns = ['_'.join(col).strip() for col in df.columns.values]
print(df)

输出：

   escuela_id_  nivel_id_  cubierto_apoyo  cubierto_direccion  cubierto_frente_a_alunos  total_apoyo  total_direccion  total_frente_a_alunos
0    420000707          1             2.0                 1.0                       4.0          2.0              1.0                    4.0
1    840477414          2             3.0                 2.0                       8.0          4.0              2.0                    8.0
2    918751515          3            37.0                 7.0                       NaN         37.0              7.0                    NaN
3    993683216          1             NaN                 1.0                       7.0          NaN              1.0                    7.0