扁平化字典列表

Question

扁平化字典列表

4

I have a pandas dataframe (sample) as follows

df = pd.DataFrame({'Country':['India', 'China', 'Nepal'],
          'Habitat':[[{'city1':'Ind1','city2':'Ind2'},{'town1':'IndT1','town2':'IndT2'}],
                     [{'city1':'Chi1','city2':'Chi2'},{'town1':'ChiT1','town2':'ChiT2'}],
                     [{'city1':'Nep1','city2':'Nep2'},{'town1':'NepT1','town2':'NepT2'}]],
            'num':[1,2,3]
          })

df

    Country                                                           Habitat   num
0   India   [{'city1':'Ind1','city2':'Ind2'},{'town1':'IndT1','town2':'IndT2'}] 1
1   China   [{'city1':'Chi1','city2':'Chi2'},{'town1':'ChiT1','town2':'ChiT2'}] 2
2   Nepal   [{'city1':'Nep1','city2':'Nep2'},{'town1':'NepT1','town2':'NepT2'}] 3

我需要以这种格式将其展平。

result_df = pd.DataFrame({'Country':['India', 'China', 'Nepal'],
          'Habitat.city1':['Ind1','Chi1','Nep1'],
            'Habitat.city2':['Ind2','Chi2','Nep2'],
            'Habitat.town1':['IndT1','ChiT1','NepT1'],
            'Habitat.town2':['IndT2','ChiT2','NepT2'],
            'num':[1,2,3]
          })

result_df

    Country Habitat.city1   Habitat.city2   Habitat.town1   Habitat.town2   num
    India       Ind1            Ind2            IndT1           IndT2       1
    China       Chi1            Chi2            ChiT1           ChiT2       2
    Nepal       Nep1            Nep2            NepT1           NepT2       3

我尝试过 pd.json_normalize(df.explode('Habitat')['Habitat'])，但它会创建我不需要的新行。

我的观察：某种形式的groupby和transpose可以很好地构建在pd.json_normalize(df.explode('Habitat')['Habitat])的基础上来解决我的问题，但到目前为止我还没有找到合适的方法。

- Charizard_knows_to_code

1

这个数据框是基于一个源JSON创建的吗？在创建数据框之前，我们可以将其扁平化吗？ - Scott Boston

1

源文件是一个json文件。但是文件可能会变得非常大，因此逐行迭代可能会非常昂贵。 - Charizard_knows_to_code

在将JSON文件转换为Pandas之前进行处理，可以提高性能。 - sammywemmy

2个回答

3

在Python 3.9+中，您可以使用字典union，如下所示：

import pandas as pd
from operator import or_
from itertools import starmap
    
    
flat = pd.DataFrame(starmap(or_, df['Habitat']), df.index).add_prefix('Habitat.')
res = pd.concat([df.drop(labels=['Habitat'], axis=1), flat], axis=1)
print(res)

输出

Country  num Habitat.city1 Habitat.city2 Habitat.town1 Habitat.town2
0   India    1          Ind1          Ind2         IndT1         IndT2
1   China    2          Chi1          Chi2         ChiT1         ChiT2
2   Nepal    3          Nep1          Nep2         NepT1         NepT2

运算符or_会调用对象底层的|实现，根据文档:

返回 a 和 b 的按位或。

对于字典的情况，|是联合操作。有关如何合并两个字典的更多信息，请参见此答案。

适用于任意数量字典的另一种解决方案是使用functools.reduce：

import pandas as pd
from operator import or_
from functools import reduce, partial

merge = partial(reduce, or_)

flat = pd.DataFrame(map(merge, df['Habitat']), df.index).add_prefix('Habitat.')
res = pd.concat([df.drop(labels=['Habitat'], axis=1), flat], axis=1)

更多信息请参见partial和reduce。

- Dani Mesejo

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Shubham Sharma · Accepted Answer

让我们使用ChainMap来合并每行中的字典列表，然后创建一个新的数据框，并与原始数据框进行join操作。

from itertools import starmap
from collections import ChainMap

h = pd.DataFrame(starmap(ChainMap, df['Habitat']), df.index)
df.join(h.add_prefix('Habitat.'))

  Country                                                                     Habitat  num Habitat.city1 Habitat.city2 Habitat.town1 Habitat.town2
0   India  [{'city1': 'Ind1', 'city2': 'Ind2'}, {'town1': 'IndT1', 'town2': 'IndT2'}]    1          Ind1          Ind2         IndT1         IndT2
1   China  [{'city1': 'Chi1', 'city2': 'Chi2'}, {'town1': 'ChiT1', 'town2': 'ChiT2'}]    2          Chi1          Chi2         ChiT1         ChiT2
2   Nepal  [{'city1': 'Nep1', 'city2': 'Nep2'}, {'town1': 'NepT1', 'town2': 'NepT2'}]    3          Nep1          Nep2         NepT1         NepT2