如何将类字典字符串转换为字典?

7

我有一个类似以下字符串的字典:

str = "Access AR1:\n\tTargets: \n\t\tManagement Name:csw_1\n\t\tObject Name:csw_obj_1\n\t\tdetails:103\n\t\tManagement Name:csw_123\n\t\tObject Name:csw_obj_134\n\t\tdetails:123\n\tSources: \n\t\tIP:10.20.30.40\n\t\tSubnet Mask:255.255.255.255\nAccess AR2:\n\tTargets: \n\t\tManagement Name:csw_2\n\t\tObject Name:csw_obj_2\n\t\tdetails:110\n\t\tManagement Name:csw_431\n\t\tObject Name:csw_obj_21\n\t\tdetails:134\n\tSources: \n\t\tIP:10.20.10.10\n\t\tSubnet Mask:255.255.255.192"

这种格式一直延续下去,看起来是这样的:

Access AR1:
    Targets: 
            Management Name:csw_1
            Object Name:csw_obj_1
            details:103
            Management Name:csw_123
            Object Name:csw_obj_134
            details:123
    Sources: 
            IP:10.20.30.40
            Subnet Mask:255.255.255.255
Access AR2:
    Targets: 
            Management Name:csw_2
            Object Name:csw_obj_2
            details:110
            Management Name:csw_431
            Object Name:csw_obj_21
            details:134
    Sources: 
            IP:10.20.10.10
            Subnet Mask:255.255.255.192

这需要被更改为:

str = {"Access AR1": { "Targets": [{"Management Name:csw_1", "Object Name":"csw_obj_1", "details":"103"}, {"Management Name:csw_123", "Object Name":"csw_obj_134", "details":"123"}]
                      "Sources": {"IP":"10.20.30.40", "Subnet Mask": "255.255.255.255"}
                    },
      "Access AR2": { "Targets": [{"Management Name:csw_2", "Object Name":"csw_obj_2", "details":"110"}, {"Management Name:csw_431", "Object Name":"csw_obj_21", "details":"133"}]
                      "Sources": {"IP":"10.20.10.10", "Subnet Mask": "255.255.255.192"}
                    }
      }

我尝试使用 ast.literal_evaleval,但遇到了错误。


3
请分享你打算用来将字符串转换为字典的代码和错误信息。 - Diesan Romero
@DiesanRomero 我使用了 ast.literal_eval(str),但是它抛出语法错误,因为给定的字符串中缺少空格和引号。 - nonuser
要使用 ast.literal_eval,字符串需要放在括号内。该方法的作用是将字符串评估为Python表达式。如果您想将该字符串转换为字典,我可以给您一些选项。顺便问一下,这个字符串来自哪里,为什么要使用这种格式? - Diesan Romero
"Sources": "IP": "10.20.30.40" 这不是字典。请更正这行,然后我可以给您一个代码,使输出符合输入。 - Bahae El Hmimdi
@BahaeElHmimdi 这是错误,现在已经进行了编辑。 - nonuser
@DiesanRomero,我也很困惑是谁设计了这种API响应,除此之外,我需要访问每个值,所以我需要将其处理为字典以获取这些值。 - nonuser
3个回答

8
你的字符串与YAML格式非常接近-将\t替换为一个或多个空格即可加载为YAML。
首先:
pip install pyyaml

然后这段代码就可以正常工作:

import yaml
import pprint

str = "Access AR1:\n\tTargets: \n\t\tManagement Name:csw_1\n\t\tObject Name:csw_obj_1\n\t\tdetails:103\n\tSources: \n\t\tIP:10.20.30.40\n\t\tSubnet Mask:255.255.255.255\nAccess AR2:\n\tTargets: \n\t\tManagement Name:csw_2\n\t\tObject Name:csw_obj_2\n\t\tdetails:110\n\tSources: \n\t\tIP:10.20.10.10\n\t\tSubnet Mask:255.255.255.192"

str1 = str.replace( '\t', '    ' )

res = yaml.load(str1)

pprint.pprint( res )

输出:

{'Access AR1': {'Sources': 'IP:10.20.30.40 Subnet Mask:255.255.255.255',
                'Targets': 'Management Name:csw_1 Object Name:csw_obj_1 '
                           'details:103'},
 'Access AR2': {'Sources': 'IP:10.20.10.10 Subnet Mask:255.255.255.192',
                'Targets': 'Management Name:csw_2 Object Name:csw_obj_2 '
                           'details:110'}}

或者,如果你真的想将其作为字符串处理:

str = repr(res)

更新

刚刚发现例如'Management Name:csw_1'并没有被检测为键值对。需要使用正则表达式re.sub()将其拆分成不同行,以解决此问题:

import yaml
import pprint
import re

str = "Access AR1:\n\tTargets: \n\t\tManagement Name:csw_1\n\t\tObject Name:csw_obj_1\n\t\tdetails:103\n\tSources: \n\t\tIP:10.20.30.40\n\t\tSubnet Mask:255.255.255.255\nAccess AR2:\n\tTargets: \n\t\tManagement Name:csw_2\n\t\tObject Name:csw_obj_2\n\t\tdetails:110\n\tSources: \n\t\tIP:10.20.10.10\n\t\tSubnet Mask:255.255.255.192"

# replace \t with four-space indent
str1 = str.replace( '\t', '    ' )

# further tweak to split sub-keys like " Management Name:csw_1" onto separate lines
str1 = re.sub(r"^(\s+)(.*?\S:)(\S.*)", r"\1\2\n\1    \3",str1,flags=re.MULTILINE )
   
res = yaml.load(str1)

pprint.pprint( res )

这是调整后的字符串:

Access AR1:
    Targets:
        Management Name:
            csw_1
        Object Name:
            csw_obj_1
        details:
            103
    Sources:
        IP:
            10.20.30.40
        Subnet Mask:
            255.255.255.255
Access AR2:
    Targets:
        Management Name:
            csw_2
        Object Name:
            csw_obj_2
        details:
            110
    Sources:
        IP:
            10.20.10.10
        Subnet Mask:
            255.255.255.192

结果:

{'Access AR1': {'Sources': {'IP': '10.20.30.40',
                            'Subnet Mask': '255.255.255.255'},
                'Targets': {'Management Name': 'csw_1',
                            'Object Name': 'csw_obj_1',
                            'details': 103}},
 'Access AR2': {'Sources': {'IP': '10.20.10.10',
                            'Subnet Mask': '255.255.255.192'},
                'Targets': {'Management Name': 'csw_2',
                            'Object Name': 'csw_obj_2',
                            'details': 110}}}

1
我不知道,但我一直在尝试使用 yaml.safe_load(StringIO(s.replace('\t', '\s\s\s\s'))) 很长时间了,但无法解决问题。str.replace( '\t', ' ' ) 似乎很好用。 - Epsi95
1
我努力尝试将字符串进行正则表达式替换为字典字符串(添加引号和方括号),但没有成功。 - Lei Yang
1
我的电脑上使用Python 3.9.5在Win10系统下运行以上代码和数据是有效的。 - DisappointedByUnaccountableMod
1
@nonuser yaml 严格依赖于缩进。因此请检查您的制表符数量。 - Lei Yang
@barny现在可以工作,但数据仍然没有很好地格式化,例如“子网掩码”被分开,而“子网”成为了先前键的值,只有“掩码”成为了相应值的键。 - nonuser
显示剩余3条评论

2

您好,这是您需要的解决方案。

strr = "Access AR1:\n\tTargets: \n\t\tManagement Name:csw_1\n\t\tObject Name:csw_obj_1\n\t\tdetails:103\n\tSources: \n\t\tIP:10.20.30.40\n\t\tSubnet Mask:255.255.255.255\nAccess AR2:\n\tTargets: \n\t\tManagement Name:csw_2\n\t\tObject Name:csw_obj_2\n\t\tdetails:110\n\tSources: \n\t\tIP:10.20.10.10\n\t\tSubnet Mask:255.255.255.192"
nvstr=strr.replace("\n\t\t","-").replace("\n\t","+")
#print(nvstr)
nvdd={}
for u in nvstr.split("\n"):
   #print(u) 
   dts=u.split("+") 
   
   nvdd[dts[0]]={}
   
   for el in dts[1:]:
       
     dts1=el.split("-")
     nvdd[dts[0]][dts1[0][:-1]]={}
     for el1 in dts1[1:]:
         
       k,v=el1.split(":")
       nvdd[dts[0]][dts1[0][:-1]][k]=v
       
print(nvdd)    

谢谢。k,v=el1.split(":")有一些错误,不过上面提出的解决方案已经足够好了。 - nonuser
这种方法可能有效,但不具有可扩展性。如果输入超过3个级别怎么办? - Lei Yang
你想要一个可扩展的解决方案吗? - Bahae El Hmimdi
可扩展性是软件设计的一个方面,对吧?不过我不是原帖作者。 - Lei Yang

1
您可以使用递归将输入转换为字典,而无需事先对字符串进行任何调整:
import re
s = "Access AR1:\n\tTargets: \n\t\tManagement Name:csw_1\n\t\tObject Name:csw_obj_1\n\t\tdetails:103\n\tSources: \n\t\tIP:10.20.30.40\n\t\tSubnet Mask:255.255.255.255\nAccess AR2:\n\tTargets: \n\t\tManagement Name:csw_2\n\t\tObject Name:csw_obj_2\n\t\tdetails:110\n\tSources: \n\t\tIP:10.20.10.10\n\t\tSubnet Mask:255.255.255.192"
def to_dict(d):
   k, v, r = None, [], {}
   for *b, a in d:
     if not b:
        if k is not None:
           if not v:
              r[j[0]] = (j:=re.split(':\s*', k))[-1]
           else:
              r[re.split(':\s*', k)[0]] = to_dict(v)
        k, v = a, []
     else:
         v.append([*b[1:], a])
   if k is not None:
       if not v:
          r[j[0]] = (j:=re.split(':\s*', k))[-1]
       else:
          r[re.split(':\s*', k)[0]] = to_dict(v)
   return r

import json       
new_s = [re.findall('\t|[^\t]+$', i) for i in s.split('\n')]
print(json.dumps(to_dict(new_s), indent=4))

输出:

{
    "Access AR1": {
        "Targets": {
            "Management Name": "csw_1",
            "Object Name": "csw_obj_1",
            "details": "103"
        },
        "Sources": {
            "IP": "10.20.30.40",
            "Subnet Mask": "255.255.255.255"
        }
    },
    "Access AR2": {
        "Targets": {
            "Management Name": "csw_2",
            "Object Name": "csw_obj_2",
            "details": "110"
        },
        "Sources": {
            "IP": "10.20.10.10",
            "Subnet Mask": "255.255.255.192"
        }
    }
}

我相信你的代码是可以工作的,但就个人而言,我觉得它难以理解;我希望有一些注释来解释正在发生的事情。还有更好的变量名;例如 d 没有任何有用的含义,abv 也是如此。 - DisappointedByUnaccountableMod

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接