将字典转换为pandas数据帧(DataFrame)

3

我的数据看起来像这样:

{u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'}

我想将其转换为pandas数据框。但是当我尝试时:
df = pd.DataFrame(response.items())

我得到了一个包含两列的DataFrame,第一列是第一个键,第二列是该键对应的值:

                            0                       1 
0  "57e01311817bc367c030b390"   {"ad_since": 2016, "indoor_swimming_pool": "No...
1  "57e01311817bc367c030b3a8"   {"ad_since": 2012, "indoor_swimming_pool": "No... 

我该如何为每个键"ad_since""indoor_swimming_pool""indoor_swimming_pool"获取单独的列?保留第一列或将ID作为索引。

尝试使用read_json函数,网址为http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.read_json.html - Richard Rublev
你是否尝试使用 pd.DataFrame(response.items()) 来测试你的样本数据?对我来说它不起作用。 - jezrael
@jezrael 感谢您的评论,我已经编辑了我的帖子。 - mitsi
@RichardRublev 我尝试了,但是出现了错误 TypeError: Expected String or Unicode - mitsi
@mitsi - 谢谢。但我认为两个记录很好,但现在只有一个记录 - DataFrame中的第二行缺失了。你能添加一些有效的 json或json列表吗? - jezrael
@jezrael,现在两行都完成了。 - mitsi
2个回答

2
你需要通过.apply(literal_eval).apply(json.loads)typestr的列转换为dict,然后使用DataFrame.from_records函数:
import pandas as pd
from ast import literal_eval

response = {u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', 
           u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'}

df = pd.DataFrame.from_dict(response, orient='index')

print (type(df.iloc[0,0]))
<class 'str'>

df.iloc[:,0] = df.iloc[:,0].apply(literal_eval)

print (pd.DataFrame.from_records(df.iloc[:,0].values.tolist(), index=df.index))
                            ad_since handicapped_access indoor_swimming_pool  \
"57e01311817bc367c030b3a8"      2012                Yes                   No   
"57e01311817bc367c030b390"      2016                Yes                   No   

                           seaside  
"57e01311817bc367c030b3a8"      No  
"57e01311817bc367c030b390"      No  

import pandas as pd
import json

response = {u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', 
           u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'}


df = pd.DataFrame.from_dict(response, orient='index')
df.iloc[:,0] = df.iloc[:,0].apply(json.loads)


print (pd.DataFrame.from_records(df.iloc[:,0].values.tolist(), index=df.index))
                            ad_since handicapped_access indoor_swimming_pool  \
"57e01311817bc367c030b3a8"      2012                Yes                   No   
"57e01311817bc367c030b390"      2016                Yes                   No   

                           seaside  
"57e01311817bc367c030b3a8"      No  
"57e01311817bc367c030b390"      No  

使用第一种方法(使用literal_eval)和整个数据集,我遇到了错误ValueError: malformed string,这可能是由于特殊字符引起的。但是使用第二种方法(使用json.loads)完全正常,谢谢。 - mitsi
很高兴能为您服务。 - jezrael

1
作为字符串值,您可以使用json模块和列表推导式:
In [20]: d =     {u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'}

In [21]: import json

In [22]: pd.DataFrame(dict([(k, [json.loads(e)[k] for e in d.values()]) for k in json.loads(d.values()[0])]), index=d.keys())Out[22]: 
                            ad_since handicapped_access indoor_swimming_pool  \
"57e01311817bc367c030b390"      2016                Yes                   No   
"57e01311817bc367c030b3a8"      2012                Yes                   No   

                       seaside  
"57e01311817bc367c030b390"      No  
"57e01311817bc367c030b3a8"      No  

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接