描述
如何使用Pandas groupby 分组某些列,而不是其他列?
当前进展
Original Answer翻译成"最初的回答"table_D = pd.DataFrame({
'Geo_ID': [1, 1, 1, 1, 2, 3, 4, 4, 5],
'A_Code': [12, 12, 12, 65, 65, 65, 65, 98, 98],
'A_Cost': [2, 9, 1, 10, 6, 7, 7, 6, 2],
}, columns=['Geo_ID', 'A_Code', 'A_Cost'])
table_D_dummies = pd.get_dummies(data = table_D, columns = ["A_Code"])
table_D_dummies_grouped = table_D_dummies.groupby(by = ["Geo_ID"]).sum()
问题
如下所示,这个表格正确地按Geo_ID汇总成本。不幸的是,它也在按A_Code汇总。
A_Code_12、A_Code_65和A_Code_98应该分别组合。此外,在实际数据集中,有超过100个A_Codes。
数据
table_D
最初的回答
请提供更多关于问题的上下文和相关代码,以便我们能够更好地帮助您解决问题。
+--------+--------+--------+
| Geo_ID | A_Code | A_Cost |
+--------+--------+--------+
| 1 | 12 | 2 |
| 1 | 12 | 9 |
| 1 | 12 | 1 |
| 1 | 65 | 10 |
| 2 | 65 | 6 |
| 3 | 65 | 7 |
| 4 | 65 | 7 |
| 4 | 98 | 6 |
| 5 | 98 | 2 |
+--------+--------+--------+
table_D_dummies
+---+--------+--------+-----------+-----------+-----------+
| | Geo_ID | A_Cost | A_Code_12 | A_Code_65 | A_Code_98 |
+---+--------+--------+-----------+-----------+-----------+
| 0 | 1 | 2 | 1 | 0 | 0 |
| 1 | 1 | 9 | 1 | 0 | 0 |
| 2 | 1 | 1 | 1 | 0 | 0 |
| 3 | 1 | 10 | 0 | 1 | 0 |
| 4 | 2 | 6 | 0 | 1 | 0 |
| 5 | 3 | 7 | 0 | 1 | 0 |
| 6 | 4 | 7 | 0 | 1 | 0 |
| 7 | 4 | 6 | 0 | 0 | 1 |
| 8 | 5 | 2 | 0 | 0 | 1 |
+---+--------+--------+-----------+-----------+-----------+
table_D_dummies_grouped
+--------+--------+-----------+-----------+-----------+
| Geo_ID | A_Cost | A_Code_12 | A_Code_65 | A_Code_98 |
+--------+--------+-----------+-----------+-----------+
| 1 | 22 | 3 | 1 | 0 |
| 2 | 6 | 0 | 1 | 0 |
| 3 | 7 | 0 | 1 | 0 |
| 4 | 13 | 0 | 1 | 1 |
| 5 | 2 | 0 | 0 | 1 |
+--------+--------+-----------+-----------+-----------+