Pandas：规范化一个DataFrame

Question

Pandas：规范化一个DataFrame

11

我有一些扁平化的输入数据，想要将其规范化，将其拆分成表格。我能否使用pandas来完成 - 即通过将扁平化的数据读入DataFrame实例，然后应用一些函数来获得所需的DataFrame实例？

示例：

数据以CSV文件的形式存储在磁盘上，如下所示：

ItemId   ClientId   PriceQuoted  ItemDescription
1        1          10           scroll of Sneak
1        2          12           scroll of Sneak
1        3          13           scroll of Sneak
2        2          2500         scroll of Invisible
2        4          2200         scroll of Invisible

我想创建两个数据框：

ItemId   ItemDescription
1        scroll of Sneak
2        scroll of Invisibile

并且

ItemId   ClientId   PriceQuoted
1        1          10
1        2          12
1        3          13
2        2          2500
2        4          2200

如果 pandas 只有对于最简单情况（规范化结果为两个具有一对多关系的表——就像上面的示例一样）有一个好的解决方案，那么它可能已经足够满足我当前的需求。然而，将来我可能需要更通用的解决方案。

- max

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Wouter Overmeire · Accepted Answer

In [30]: df = pandas.read_csv('foo1.csv', sep='[\s]{2,}')

In [30]: df
Out[30]:
   ItemId  ClientId  PriceQuoted      ItemDescription
0       1         1           10      scroll of Sneak
1       1         2           12      scroll of Sneak
2       1         3           13      scroll of Sneak
3       2         2         2500  scroll of Invisible
4       2         4         2200  scroll of Invisible

In [31]: df1 = df[['ItemId', 'ItemDescription']].drop_duplicates().set_index('ItemId')

In [32]: df1
Out[32]:
            ItemDescription
ItemId
1           scroll of Sneak
2       scroll of Invisible

In [33]: df2 = df[['ItemId', 'ClientId', 'PriceQuoted']]

In [34]: df2
Out[34]:
   ItemId  ClientId  PriceQuoted
0       1         1           10
1       1         2           12
2       1         3           13
3       2         2         2500
4       2         4         2200