lxml：向输入文件添加命名空间

Question

lxml：向输入文件添加命名空间

18

我正在解析由外部程序生成的XML文件。然后，我想使用自己的命名空间向该文件添加自定义注释。我的输入如下：

<sbml xmlns="http://www.sbml.org/sbml/level2/version4" xmlns:celldesigner="http://www.sbml.org/2001/ns/celldesigner" level="2" version="4">
  <model metaid="untitled" id="untitled">
    <annotation>...</annotation>
    <listOfUnitDefinitions>...</listOfUnitDefinitions>
    <listOfCompartments>...</listOfCompartments>
    <listOfSpecies>
      <species metaid="s1" id="s1" name="GenA" compartment="default" initialAmount="0">
        <annotation>
          <celldesigner:extension>...</celldesigner:extension>
        </annotation>
      </species>
      <species metaid="s2" id="s2" name="s2" compartment="default" initialAmount="0">
        <annotation>
           <celldesigner:extension>...</celldesigner:extension>
        </annotation>
      </species>
    </listOfSpecies>
    <listOfReactions>...</listOfReactions>
  </model>
</sbml>

问题在于lxml只会在使用时声明命名空间，这意味着声明会被重复多次出现，就像下面这样（简化版）：

<sbml xmlns="namespace" xmlns:celldesigner="morenamespace" level="2" version="4">
  <listOfSpecies>
    <species>
      <kjw:test xmlns:kjw="http://this.is.some/custom_namespace"/>
      <celldesigner:data>Some important data which must be kept</celldesigner:data>
    </species>
    <species>
      <kjw:test xmlns:kjw="http://this.is.some/custom_namespace"/>
    </species>
    ....
  </listOfSpecies>
</sbml>

lxml可以强制只在父元素（如sbml或listOfSpecies）中写入此声明一次吗？还是有充分的理由不这样做？我想要的结果应该是：

<sbml xmlns="namespace" xmlns:celldesigner="morenamespace" level="2" version="4"  xmlns:kjw="http://this.is.some/custom_namespace">
  <listOfSpecies>
    <species>
      <kjw:test/>
      <celldesigner:data>Some important data which must be kept</celldesigner:data>
    </species>
    <species>
      <kjw:test/>
    </species>
    ....
  </listOfSpecies>
</sbml>

重要的问题是必须保留从文件中读取的现有数据，因此我不能只是创建一个新的根元素（我认为？）。

编辑：下面附上代码。

def annotateSbml(sbml_input):
  from lxml import etree

  checkSbml(sbml_input) # Makes sure the input is valid sbml/xml.

  ns = "http://this.is.some/custom_namespace"
  etree.register_namespace('kjw', ns)

  sbml_doc = etree.ElementTree()
  root = sbml_doc.parse(sbml_input, etree.XMLParser(remove_blank_text=True))
  nsmap = root.nsmap
  nsmap['sbml'] = nsmap[None] # Makes code more readable, but seems ugly. Any alternatives to this?
  nsmap['kjw'] = ns
  ns = '{' + ns + '}'
  sbmlns = '{' + nsmap['sbml'] + '}'

  for species in root.findall('sbml:model/sbml:listOfSpecies/sbml:species', nsmap):
    species.append(etree.Element(ns + 'test'))

  sbml_doc.write("test.sbml.xml", pretty_print=True, xml_declaration=True)

  return

- kai

@Marcin：完成了。有什么建议吗？ - kai

@mzjin 我的输入包含除了 <kjw:test/> 标记以外的所有内容。目标是在此列表中的每个物种中插入此类标记（或类似标记，例如 kjw:score 或 kjw:length）。这个意思清楚吗？还是我应该发布整个文件呢（想到我的原始问题已经够长了）？ - kai

@mzjin 对不起，我有点过于简化了。是的，它确实包含模型标签。我已经使用了 sbml:model 标签以及 nsmap['sbml'] = nsmap[None]，这样解析器就可以正确地将模型中的命名空间替换为根命名空间，否则似乎无法实现。 - kai

6个回答

7

我知道这是一个旧问题，但它仍然有效，并且在lxml 3.5.0中，可能有更好的解决方案：

cleanup_namespaces()接受一个新参数top_nsmap，将提供的前缀-命名空间映射的定义移动到树的顶部。

因此，现在可以通过简单调用将命名空间映射向上移动：

nsmap = {'kjw': 'http://this.is.some/custom_namespace'}
etree.cleanup_namespaces(root, top_nsmap=nsmap)

- Michal Čihař

3

与其直接处理原始的XML文档，你也可以使用LibSBML, 这是一个用于操作SBML文档的库，它提供了多种语言的绑定，包括python。在这里，你可以像这样使用：

>>> from libsbml import *
>>> doc = readSBML('Dropbox/SBML Models/BorisEJB.xml')
>>> species = doc.getModel().getSpecies('MAPK')
>>> species.appendAnnotation('<kjw:test xmlns:kjw="http://this.is.some/custom_namespace"/>')
0
>>> species.toSBML()
'<species id="MAPK" compartment="compartment" initialConcentration="280" boundaryCondition="false">\n  <annotation>\n
 <kjw:test xmlns:kjw="http://this.is.some/custom_namespace"/>\n  </annotation>\n</species>'
>>>

- Frank

1

我写了这个函数，将命名空间添加到根元素中：

def addns(tree, alias, uri):                
    root = tree.getroot()
    nsmap = root.nsmap
    nsmap[alias] = uri
    new_root = etree.Element(root.tag, attrib=root.attrib, nsmap=nsmap)
    new_root[:] = root[:]
    return new_root.getroottree()

应用此函数后，您将获得一棵新树，但您可能可以从访问该树的单个对象更改树实例。 _{...因为您拥有强大的面向对象设计！}。

- Kristian Benoit

1

如果您暂时将命名空间属性添加到根节点，那么就可以解决问题。

ns = '{http://this.is.some/custom_namespace}'

# add 'kjw:foobar' attribute to root node
root.set(ns+'foobar', 'foobar')

# add kjw namespace elements (or attributes) elsewhere
... get child element species ...
species.append(etree.Element(ns + 'test'))

# remove temporary namespaced attribute from root node
del root.attrib[ns+'foobar']

- scanny

0

你可以替换根元素以将“kjw”添加到其nsmap中。然后xmlns声明将仅在根元素中。

- jfs

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jterrace · Accepted Answer

在lxml中，修改节点的命名空间映射是不可能的。请参见这个开放的工单，该特性被列为愿望清单之一。

此问题起源于lxml邮件列表中的此帖子，其中提供了替换根节点的解决方法作为替代方案。然而，使用替换根节点存在一些问题：请参见上面提到的工单。

为了完整起见，我将在此处提供建议的根节点替换解决方法代码：

>>> DOC = """<sbml xmlns="http://www.sbml.org/sbml/level2/version4" xmlns:celldesigner="http://www.sbml.org/2001/ns/celldesigner" level="2" version="4">
...   <model metaid="untitled" id="untitled">
...     <annotation>...</annotation>
...     <listOfUnitDefinitions>...</listOfUnitDefinitions>
...     <listOfCompartments>...</listOfCompartments>
...     <listOfSpecies>
...       <species metaid="s1" id="s1" name="GenA" compartment="default" initialAmount="0">
...         <annotation>
...           <celldesigner:extension>...</celldesigner:extension>
...         </annotation>
...       </species>
...       <species metaid="s2" id="s2" name="s2" compartment="default" initialAmount="0">
...         <annotation>
...            <celldesigner:extension>...</celldesigner:extension>
...         </annotation>
...       </species>
...     </listOfSpecies>
...     <listOfReactions>...</listOfReactions>
...   </model>
... </sbml>"""
>>> 
>>> from lxml import etree
>>> from StringIO import StringIO
>>> NS = "http://this.is.some/custom_namespace"
>>> tree = etree.ElementTree(element=None, file=StringIO(DOC))
>>> root = tree.getroot()
>>> nsmap = root.nsmap
>>> nsmap['kjw'] = NS
>>> new_root = etree.Element(root.tag, nsmap=nsmap)
>>> new_root[:] = root[:]
>>> new_root.append(etree.Element('{%s}%s' % (NS, 'test')))
>>> new_root.append(etree.Element('{%s}%s' % (NS, 'test')))

>>> print etree.tostring(new_root, pretty_print=True)
<sbml xmlns:celldesigner="http://www.sbml.org/2001/ns/celldesigner" xmlns:kjw="http://this.is.some/custom_namespace" xmlns="http://www.sbml.org/sbml/level2/version4"><model metaid="untitled" id="untitled">
    <annotation>...</annotation>
    <listOfUnitDefinitions>...</listOfUnitDefinitions>
    <listOfCompartments>...</listOfCompartments>
    <listOfSpecies>
      <species metaid="s1" id="s1" name="GenA" compartment="default" initialAmount="0">
        <annotation>
          <celldesigner:extension>...</celldesigner:extension>
        </annotation>
      </species>
      <species metaid="s2" id="s2" name="s2" compartment="default" initialAmount="0">
        <annotation>
           <celldesigner:extension>...</celldesigner:extension>
        </annotation>
      </species>
    </listOfSpecies>
    <listOfReactions>...</listOfReactions>
  </model>
<kjw:test/><kjw:test/></sbml>