使用R语言从XML文件（声明）中提取数据

Question

使用R语言从XML文件（声明）中提取数据

3

我正在尝试从如下所示的xml文件中提取数据。我需要提取类型为0的节点内部的id。我必须只使用R语言解决此问题。目前，我可以通过xmlToList("test.xml")[[3]][[1]]提取类型，通过xmlToList("test.xml")[[3]][[4]]提取id。将数字3更改为6、9等，我可以检索所有所需的类型和id。但我不确定这是否正确，因为它是基于可能会更改的编号。请问你能否提出另一种更简单的从xml中提取数据的方法？或者对我的非理想解决方案进行修改？谢谢！

<?xml version="1.0" encoding="UTF-8"?>
<image name="test1" id="367432589" width="952" height="1024" create_date="Mar 2, 2009" >
  <nodes>
    <node type="16" name="Target532" url="/cgi/im?id=5657" id="5657" x="67" y="45" width="153" height="69">
      <alt>Synthesis1</alt>
      <Appearance TextArea="Rectangle: 550"  Comlex="Boolean: true" />
    </node>
    <node type="0" name="Target1" url="/cgi/im?id=680" id="680" x="193" y="535" width="70" height="70">
      <alt>Object &lt;b&gt;Target1&lt;TestingCond32</alt>
      <Appearance TextArea="Rectangle: 210"  Comlex="Boolean: false" />
    </node>
  </nodes>
  <edges>
    <edge type="-100" id="234523">
      <alt />
      <Appearance Visualization="String: Hexa" HexagonIndex="Integer: 0" />
    </edge>
    <edge type="-100" id="23">
      <alt />
      <Appearance Visualization="String: Hexa" HexagonIndex="Integer: 0" />
    </edge>
  </edges>
</image>

我对XML还很陌生，只有R的基础知识。谢谢！

- John Amraph

2

如果你是新手，我建议你看看talkstats.com上的这个帖子(链接)。在这个帖子中，我提出了很多初学者问题，Bryan Goodrich给出了非常好的建议和指导。我一直想写一篇关于入门爬虫的博客文章... - Tyler Rinker

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- shhhhimhuntingrabbits · Accepted Answer

您可以尝试以下方法：

xpathSApply(xdata,"//*/node[@type=\"0\"]/@id")

> xpathSApply(xdata,"//*/node[@type=\"0\"]/@id")   id "680"

这个代码会查找一个名为"node"的节点，该节点具有属性"type"且值为0。然后返回与此节点关联的id属性值。