如何将YAML文件解析/读取为Python对象?

126
```python 如何将YAML文件解析/读取为Python对象?
例如,这是一个YAML文件: ```
Person:
  name: XYZ

这是一个Python类:

class Person(yaml.YAMLObject):
  yaml_tag = 'Person'

  def __init__(self, name):
    self.name = name

顺便说一下,我正在使用PyYAML。


我建议使用python-box - evolved
3
我如何在Python中解析YAML文件?这绝对不是与如何将YAML文件解析/读取为Python对象相同的问题。一般的解析和解析成面向对象的结构是两个不同的事情。我投票重新打开 - 看看这里的答案有多少赞... - Wolfgang Fahl
4个回答

219
如果您的YAML文件看起来像这样:
# tree format
treeroot:
    branch1:
        name: Node 1
        branch1-1:
            name: Node 1-1
    branch2:
        name: Node 2
        branch2-1:
            name: Node 2-1

而且您已经这样安装了 PyYAML:

pip install PyYAML

而Python代码则如下所示:

import yaml
with open('tree.yaml') as f:
    # use safe_load instead load
    dataMap = yaml.safe_load(f)

变量dataMap现在包含一组树形数据的字典。如果您使用PrettyPrint打印dataMap,您将得到类似以下的内容:

{
    'treeroot': {
        'branch1': {
            'branch1-1': {
                'name': 'Node 1-1'
            },
            'name': 'Node 1'
        },
        'branch2': {
            'branch2-1': {
                'name': 'Node 2-1'
            },
            'name': 'Node 2'
        }
    }
}

因此,现在我们已经知道如何将数据传输到Python程序中。保存数据同样很简单:

with open('newtree.yaml', "w") as f:
    yaml.dump(dataMap, f)
你有一个字典,现在你需要将它转换为Python对象:
class Struct:
    def __init__(self, **entries): 
        self.__dict__.update(entries)

然后您可以使用:

>>> args = your YAML dictionary
>>> s = Struct(**args)
>>> s
<__main__.Struct instance at 0x01D6A738>
>>> s...

并且关注“将Python字典转换为对象”。

如需更多信息,您可以查看pyyaml.org此处


8
“@personal_cloud ,‘Never’(从不)这个词用得有些过于绝对了。例如,运行在 Docker 容器中的 Python 应用程序可能会选择不使用虚拟环境,因为它只是一个额外的层,除了容器本身已经提供的隔离之外,没有提供任何其他隔离措施。” - Joe Holloway

19

http://pyyaml.org/wiki/PyYAMLDocumentation

add_path_resolver(tag,path,kind) 添加基于路径的隐式标签解析器。路径是一系列键,这些键形成表示图中节点的路径。路径元素可以是字符串值、整数或None。节点的类型可以是str、list、dict或None。

#!/usr/bin/env python
import yaml

class Person(yaml.YAMLObject):
  yaml_tag = '!person'

  def __init__(self, name):
    self.name = name

yaml.add_path_resolver('!person', ['Person'], dict)

data = yaml.load("""
Person:
  name: XYZ
""")

print data
# {'Person': <__main__.Person object at 0x7f2b251ceb10>}

print data['Person'].name
# XYZ

注意:库永远不应该安装在虚拟环境之外。 - personal_cloud
在 pip 之前别忘了执行 sudo apt-get install libyaml-cpp-dev - personal_cloud
pip install 是永久性的,请注意。https://dev59.com/N3I_5IYBdhLWcg3wDOhC - personal_cloud
@personal_cloud 虚拟环境很棒,但 pip install 不是永久的。正如您引用的问题的这个答案所述,有一个 pip uninstall 命令。然后可以使用软件包管理器恢复原始软件包。 - ederag
6
虚拟环境很棒,但是pipenv更酷。 - user2393229

4

我使用命名元组编写了一个实现,因为它有点易读,所以我认为这很不错。它也可以处理字典嵌套的情况。解析器代码如下:

from collections import namedtuple


class Dict2ObjParser:
    def __init__(self, nested_dict):
        self.nested_dict = nested_dict

    def parse(self):
        nested_dict = self.nested_dict
        if (obj_type := type(nested_dict)) is not dict:
            raise TypeError(f"Expected 'dict' but found '{obj_type}'")
        return self._transform_to_named_tuples("root", nested_dict)

    def _transform_to_named_tuples(self, tuple_name, possibly_nested_obj):
        if type(possibly_nested_obj) is dict:
            named_tuple_def = namedtuple(tuple_name, possibly_nested_obj.keys())
            transformed_value = named_tuple_def(
                *[
                    self._transform_to_named_tuples(key, value)
                    for key, value in possibly_nested_obj.items()
                ]
            )
        elif type(possibly_nested_obj) is list:
            transformed_value = [
                self._transform_to_named_tuples(f"{tuple_name}_{i}", possibly_nested_obj[i])
                for i in range(len(possibly_nested_obj))
            ]
        else:
            transformed_value = possibly_nested_obj

        return transformed_value

我使用以下代码测试了基本情况:

x = Dict2ObjParser({
    "a": {
        "b": 123,
        "c": "Hello, World!"
    },
    "d": [
        1,
        2,
        3
    ],
    "e": [
        {
            "f": "",
            "g": None
        },
        {
            "f": "Foo",
            "g": "Bar"
        },
        {
            "h": "Hi!",
            "i": None
        }
    ],
    "j": 456,
    "k": None
}).parse()

print(x)

它会输出以下内容:root(a=a(b=123, c='Hello, World!'), d=[1, 2, 3], e=[e_0(f='', g=None), e_1(f='Foo', g='Bar'), e_2(h='Hi!', i=None)], j=456, k=None) 稍微格式化后,它看起来像这样:
root(
    a=a(
        b=123,
        c='Hello, World!'
    ),
    d=[1, 2, 3],
    e=[
        e_0(
            f='',
            g=None
        ),
        e_1(
            f='Foo',
            g='Bar'
        ),
        e_2(
            h='Hi!',
            i=None
        )
    ],
    j=456,
    k=None
)

我可以像访问其他对象一样访问嵌套字段:

print(x.a.b)  # Prints: 123

在您的情况下,代码最终将如下所示:
import yaml


with open(file_path, "r") as stream:
    nested_dict = yaml.safe_load(stream)
    nested_objt = Dict2ObjParser(nested_dict).parse()

我希望这能帮到您!

这真的很酷!!!我不知道“命名元组”可以做到这一点。非常好的想法和实现。 - John Henckel

0

以下是一种测试用户在virtualenv(或系统)中选择了哪个YAML实现的方法,然后相应地定义load_yaml_file的方式:

load_yaml_file = None

if not load_yaml_file:
    try:
        import yaml
        load_yaml_file = lambda fn: yaml.load(open(fn))
    except:
        pass

if not load_yaml_file:
    import commands, json
    if commands.getstatusoutput('ruby --version')[0] == 0:
        def load_yaml_file(fn):
            ruby = "puts YAML.load_file('%s').to_json" % fn
            j = commands.getstatusoutput('ruby -ryaml -rjson -e "%s"' % ruby)
            return json.loads(j[1])

if not load_yaml_file:
    import os, sys
    print """
ERROR: %s requires ruby or python-yaml  to be installed.

apt-get install ruby

  OR

apt-get install python-yaml

  OR

Demonstrate your mastery of Python by using pip.
Please research the latest pip-based install steps for python-yaml.
Usually something like this works:
   apt-get install epel-release
   apt-get install python-pip
   apt-get install libyaml-cpp-dev
   python2.7 /usr/bin/pip install pyyaml
Notes:
Non-base library (yaml) should never be installed outside a virtualenv.
"pip install" is permanent:
  https://dev59.com/N3I_5IYBdhLWcg3wDOhC
Beware when using pip within an aptitude or RPM script.
  Pip might not play by all the rules.
  Your installation may be permanent.
Ruby is 7X faster at loading large YAML files.
pip could ruin your life.
  https://stackoverflow.com/questions/46326059/
  https://dev59.com/nZXfa4cB1Zd3GeqPjLvd
  https://stackoverflow.com/questions/8022240/
Never use PyYaml in numerical applications.
  https://dev59.com/QF0a5IYBdhLWcg3wVHXU
If you are working for a Fortune 500 company, your choices are
1. Ask for either the "ruby" package or the "python-yaml"
package. Asking for Ruby is more likely to get a fast answer.
2. Work in a VM. I highly recommend Vagrant for setting it up.

""" % sys.argv[0]
    os._exit(4)


# test
import sys
print load_yaml_file(sys.argv[1])

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接