Python如何将单引号转换为双引号以格式化为JSON字符串

Question

Python如何将单引号转换为双引号以格式化为JSON字符串

pythonjsonregexdouble-quotessingle-quotes

10

我有一个文件，每行都有像这样的文本（代表电影演员阵容）：

[{'cast_id': 23, 'character': "Roger 'Verbal' Kint", 'credit_id': '52fe4260c3a36847f8019af7', 'gender': 2, 'id': 1979, 'name': 'Kevin Spacey', 'order': 5, 'profile_path': '/x7wF050iuCASefLLG75s2uDPFUu.jpg'}, {'cast_id': 27, 'character': 'Edie's Finneran', 'credit_id': '52fe4260c3a36847f8019b07', 'gender': 1, 'id': 2179, 'name': 'Suzy Amis', 'order': 6, 'profile_path': '/b1pjkncyLuBtMUmqD1MztD2SG80.jpg'}]

我需要将其转换为有效的json字符串，因此仅将必要的单引号转换为双引号（例如，不得转换围绕单词Verbal的单引号，文本中的任何撇号也不应转换）。

我正在使用Python 3.x。我需要找到一个正则表达式，仅将右单引号转换为双引号，从而使整个文本成为有效的JSON字符串。有什么想法吗？

- revy

5

这个文件是由什么生成的？正确的做法是将其解析为一个字典列表，然后使用"json.dump"进行编码。正则表达式并不适用，因为这不是一种正则语言。 - chepner

1

import json;json.dumps(your_dict) - Amit Tripathi

1

@AmitTripathi 这还不是一个dict，它只是文件中的一个字符串。 - chepner

2

你的输入有一个严重的问题：值 Edie's Finneran 被单引号包围；没有解析器能够知道这个撇号不是结束引号。你需要修复生成该文件的任何内容，这样你可能最好一开始就输出 JSON。 - chepner

1

你还没有回答问题：这个字符串从哪里来？为什么它不是 JSON 兼容的？有多少个这样的字符串？ - user3850

显示剩余5条评论

5个回答

2

如果您无法控制JSON数据，请不要使用eval()函数！

我创建了一个简单的JSON纠正机制，因为它更加安全：

def correctSingleQuoteJSON(s):
    rstr = ""
    escaped = False

    for c in s:
    
        if c == "'" and not escaped:
            c = '"' # replace single with double quote
        
        elif c == "'" and escaped:
            rstr = rstr[:-1] # remove escape character before single quotes
        
        elif c == '"':
            c = '\\' + c # escape existing double quotes
   
        escaped = (c == "\\") # check for an escape character
        rstr += c # append the correct json
    
    return rstr

您可以按照以下方式使用该函数：

import json

singleQuoteJson = "[{'cast_id': 23, 'character': 'Roger \\'Verbal\\' Kint', 'credit_id': '52fe4260c3a36847f8019af7', 'gender': 2, 'id': 1979, 'name': 'Kevin Spacey', 'order': 5, 'profile_path': '/x7wF050iuCASefLLG75s2uDPFUu.jpg'}, {'cast_id': 27, 'character': 'Edie\\'s Finneran', 'credit_id': '52fe4260c3a36847f8019b07', 'gender': 1, 'id': 2179, 'name': 'Suzy Amis', 'order': 6, 'profile_path': '/b1pjkncyLuBtMUmqD1MztD2SG80.jpg'}]"

correctJson = correctSingleQuoteJSON(singleQuoteJson)
print(json.loads(correctJson))

- finnmglas

1

除了用户3850的回答中提到的eval()，您还可以使用ast.literal_eval。

这个问题已经在使用Python的eval() vs. ast.literal_eval()?线程中讨论过。

您还可以查看Kaggle竞赛中以下讨论线程，其中包含与OP提到的数据类似的数据：

https://www.kaggle.com/c/tmdb-box-office-prediction/discussion/89313#latest-517927 https://www.kaggle.com/c/tmdb-box-office-prediction/discussion/80045#latest-518338

- Kaushik Acharya

1

这是获取所需输出的代码。

import ast
def getJson(filepath):
    fr = open(filepath, 'r')
    lines = []
    for line in fr.readlines():
        line_split = line.split(",")
        set_line_split = []
        for i in line_split:
            i_split = i.split(":")
            i_set_split = []
            for split_i in i_split:
                set_split_i = ""
                rev = ""
                i = 0
                for ch in split_i:
                    if ch in ['\"','\'']:
                        set_split_i += ch
                        i += 1
                        break
                    else:
                        set_split_i += ch
                        i += 1
                i_rev = (split_i[i:])[::-1]
                state = False
                for ch in i_rev:
                    if ch in ['\"','\''] and state == False:
                        rev += ch
                        state = True
                    elif ch in ['\"','\''] and state == True:
                        rev += ch+"\\"
                    else:
                        rev += ch
                i_rev = rev[::-1]
                set_split_i += i_rev
                i_set_split.append(set_split_i)
            set_line_split.append(":".join(i_set_split))
        line_modified = ",".join(set_line_split)
        lines.append(ast.literal_eval(str(line_modified)))
    return lines
lines = getJson('test.txt')
for i in lines:
    print(i)

- Tilak Putta

8

大人，您可以使用'"'.join(str.split("'"))或"\"".join(str.split("'"))来提高可读性。这个方法不会改变原意，只是让内容更加易懂。 - Shane Abram Mendez

0

import ast
json_dat = json.dumps(ast.literal_eval(row['prod_cat']))
dict_dat = json.loads(json_dat)

- Ashutosh Dadhich

3

请在您的代码中添加一些解释，而不仅仅是发布代码。 - user67275

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- user3850 · Accepted Answer

首先，你提供的那行代码示例是无法解析的！无论如何，… 'Edie's Finneran' … 都包含语法错误。

假设你对输入有控制权，你可以简单地使用 eval() 读取文件。（尽管在这种情况下，人们会想知道为什么你不能一开始就生成有效的 JSON…）

>>> f = open('list.txt', 'r')
>>> s = f.read().strip()
>>> l = eval(s)

>>> import pprint
>>> pprint.pprint(l)
[{'cast_id': 23,
  'character': "Roger 'Verbal' Kint",
  ...
  'profile_path': '/b1pjkncyLuBtMUmqD1MztD2SG80.jpg'}]

>>> import json
>>> json.dumps(l)
'[{"cast_id": 23, "character": "Roger \'Verbal\' Kint", "credit_id": "52fe4260ca36847f8019af7", "gender": 2, "id": 1979, "name": "Kevin Spacey", "order": 5, "rofile_path": "/x7wF050iuCASefLLG75s2uDPFUu.jpg"}, {"cast_id": 27, "character":"Edie\'s Finneran", "credit_id": "52fe4260c3a36847f8019b07", "gender": 1, "id":2179, "name": "Suzy Amis", "order": 6, "profile_path": "/b1pjkncyLuBtMUmqD1MztDSG80.jpg"}]'

如果您无法控制输入，这将非常危险，因为它会让您面临代码注入攻击的风险。

我再次强调，最好的解决方案是首先生成有效的JSON。