我是 Python 的新手,正在尝试将嵌套的 json
文件转换成 cvs
,但遇到了困难。为此,我首先加载了 json
,然后使用 json_normalize 进行转换,以便输出漂亮的格式化结果,接着使用 pandas 包将规范化部分输出到 cvs
。
我的示例 json:
[{
"_id": {
"id": "123"
},
"device": {
"browser": "Safari",
"category": "d",
"os": "Mac"
},
"exID": {
"$oid": "123"
},
"extreme": false,
"geo": {
"city": "London",
"country": "United Kingdom",
"countryCode": "UK",
"ip": "00.000.000.0"
},
"viewed": {
"$date": "2011-02-12"
},
"attributes": [{
"name": "gender",
"numeric": 0,
"value": 0
}, {
"name": "email",
"value": false
}],
"change": [{
"id": {
"$id": "1231"
},
"seen": [{
"$date": "2011-02-12"
}]
}]
}, {
"_id": {
"id": "456"
},
"device": {
"browser": "Chrome 47",
"category": "d",
"os": "Windows"
},
"exID": {
"$oid": "345"
},
"extreme": false,
"geo": {
"city": "Berlin",
"country": "Germany",
"countryCode": "DE",
"ip": "00.000.000.0"
},
"viewed": {
"$date": "2011-05-12"
},
"attributes": [{
"name": "gender",
"numeric": 1,
"value": 1
}, {
"name": "email",
"value": true
}],
"change": [{
"id": {
"$id": "1231"
},
"seen": [{
"$date": "2011-02-12"
}]
}]
}]
以下是代码(这里省略了嵌套部分):
import json
from pandas.io.json import json_normalize
def loading_file():
#File path
file_path = #file path here
#Loading json file
json_data = open(file_path)
data = json.load(json_data)
return data
#Storing avaliable keys
def data_keys(data):
keys = {}
for i in data:
for k in i.keys():
keys[k] = 1
keys = keys.keys()
#Excluding nested arrays from keys - hard coded -> IMPROVE
new_keys = [x for x in keys if
x != 'attributes' and
x != 'change']
return new_keys
#Excluding nested arrays from json dictionary
def new_data(data, keys):
new_data = []
for i in range(0, len(data)):
x = {k:v for (k,v) in data[i].items() if k in keys }
new_data.append(x)
return new_data
def csv_out(data):
data.to_csv('out.csv',encoding='utf-8')
def main():
data_file = loading_file()
keys = data_keys(data_file)
table = new_data(data_file, keys)
csv_out(json_normalize(table))
main()
我的当前输出大致如下:
| _id.id | device.browser | device.category | device.os | ... | viewed.$date |
|--------|----------------|-----------------|-----------|------|--------------|
| 123 | Safari | d | Mac | ... | 2011-02-12 |
| 456 | Chrome 47 | d | Windows | ... | 2011-05-12 |
| | | | | | |
我的问题是,我想把嵌套的数组包含在cvs中,所以我必须将它们展平。我无法想出如何使其通用,因此在创建表格时不使用字典的键
(numeric、id、name
)和值
。我必须使其通用,因为attributes
和change
中的键数量是不确定的。因此,我希望输出结果如下:
| _id.id | device.browser | ... | attributes_gender_numeric | attributes_gender_value | attributes_email_value | change_id | change_seen |
|--------|----------------|-----|---------------------------|-------------------------|------------------------|-----------|-------------|
| 123 | Safari | ... | 0 | 0 | false | 1231 | 2011-02-12 |
| 456 | Chrome 47 | ... | 1 | 1 | true | 1231 | 2011-02-12 |
| | | | | | | | |
非常感谢您的支持! 如果有改进代码和使其更加高效的任何提示,我们将不胜感激。