Python:将Excel文件转换为JSON格式

6
我将创建一个使用JSON文件来理解模式和响应格式的ML模型。由于我的数据是以Excel格式存在的,所以我在Python中将其转换为JSON。
以下是代码:
import xlrd
from collections import OrderedDict
import simplejson as json
# Open the workbook and select the first worksheet
wb = xlrd.open_workbook('D:\\android\\testdata2.xlsx')
sh = wb.sheet_by_index(0)
# List to hold dictionaries
data_list = []
# Iterate through each row in worksheet and fetch values into dict
for rownum in range(1, sh.nrows):
    data = OrderedDict()
    row_values = sh.row_values(rownum)
    data['pattern'] = row_values[0]
    data['response'] = row_values[1]
    data_list.append(data)
# Serialize the list of dicts to JSON
j = json.dumps(data_list)
# Write to file
with open('data1.json', 'w') as f:
    f.write(j)

我得到的输出是:

[{
    "pattern": "WALLSTENT NON COUVERTE ",
    "response": "ENDOPROTHESE STENT  VASCULAIRE "
}, {
    "pattern": "PRIMEADVANCED SURSCAN MRI ",
    "response": "NEUROSTIMULATEUR NERF VAGUE GAUCHE "
}, {
    "pattern": "AVASTIN  FLACON DE",
    "response": "BEVACIZUMAB"
}, {
    "pattern": "PERJETA SOLUTION A DILUER POUR PERFUSION",
    "response": "BRENTUXIMAB VEDOTIN"
}]

我想要的输出结果是这样的:
{
    "intents": [{
        "pattern": ["WALLSTENT, NON, COUVERTE "],
        "response": ["ENDOPROTHESE STENT  VASCULAIRE] "
    }, {
        "pattern": ["PRIMEADVANCED ,SURSCAN ,MRI"] ,
        "response": ["NEUROSTIMULATEUR NERF VAGUE GAUCHE "]
    }, {
        "pattern": ["AVASTIN , FLACON ,DE"],
        "response": ["BEVACIZUMAB"]
    }, {
        "pattern": ["PERJETA, SOLUTION, A, DILUER, POUR ,PERFUSION"],
        "response": ["BRENTUXIMAB VEDOTIN"]
    }]
}

我该在我的函数中进行哪些修改才能得到我想要的输出结果。

2个回答

2

尝试使用Python中的pyexcel_xlsx库。我已经用它将xlsx转换为json。这是一个简单而甜美的库。与其他Python库相比,速度也很快。

示例代码:

from pyexcel_xlsx import get_data;
import time;
import json;

data = get_data("D:\\android\\testdata2.xlsx")
sheetName = "Table A";

data_list = []
# Iterate through each row and append in above list
for i in range(0, len(data[sheetName])):
    data_list.append({
        'pattern' : data[sheetName][i][0],
        'response' : data[sheetName][i][1]
    })
data_list = {'intents': data_list} # Converting to required object
j = json.dumps(data_list)
# Write to file
with open('data1.json', 'w') as f:
    f.write(j)

2
那应该就可以了:
import xlrd
from collections import OrderedDict
import simplejson as json
# Open the workbook and select the first worksheet
wb = xlrd.open_workbook('D:\\android\\testdata2.xlsx')
sh = wb.sheet_by_index(0)
# List to hold dictionaries
data_list = []
# Iterate through each row in worksheet and fetch values into dict
for rownum in range(1, sh.nrows):
    data = OrderedDict()
    row_values = sh.row_values(rownum)
    data['pattern'] = row_values[0]
    data['response'] = row_values[1]
    data_list.append(data)
data_list = {'intents': data_list} # Added line
# Serialize the list of dicts to JSON
j = json.dumps(data_list)
# Write to file
with open('data1.json', 'w') as f:
    f.write(j)

请注意添加了data_list = {'intents': data_list}

我添加了这行代码,但在运行程序时出现了以下错误:"列表索引必须是整数或切片,而不是字符串"。 - Pavan Rajput
抱歉,我已修复了。 - TheNavigat
请注意,在Python shell中,data_list.append(data)和data_list = {'intents': data_list}之间需要有一个新行。 - TheDevOpsGuru

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接