在R中生成XML文档

7
在我正在工作的项目中,我需要根据用户输入自动创建一个XML文档。使用用户输入来修改xml文档的部分对我来说没有问题,但是我在R中从头开始创建XML文档方面还很新手。
我想知道是否可以使用XML或xml2包在R中生成像下面这样的XML文档。到目前为止,我已经探索了newXMLdoc、xml_new_document和xml_new_root函数,但我不熟悉创建这样一个XML文件所需的所有语法(一旦完成,应该将其保存在本地路径中)。
<session>
  <modelVersion>1.0.0</modelVersion>
  <products>
    <product>
      <refNo>1</refNo>
      <uri>S1A_IW_GRDH_1SDV_20190818T175529_20190818T175554_028627_033D25_22ED.zip</uri>
      <productReaderPlugin>class org.esa.s1tbx.io.sentinel1.Sentinel1ProductReaderPlugIn</productReaderPlugin>
    </product>
    <product>
      <refNo>2</refNo>
      <uri>S2A_MSIL1C_20190823T061631_N0208_R034_T42TXS_20190823T081730.zip</uri>
      <productReaderPlugin>class org.esa.s2tbx.dataio.s2.ortho.plugins.Sentinel2L1CProduct_Multi_UTM42N_ReaderPlugIn</productReaderPlugin>
    </product>
  </products>
  <views/>
</session>
3个回答

6

xml2(cran)是Hadley宇宙中提供的另一种解决方案。

library(xml2)
library(tidyverse)

df <- data.frame(number = c(1, 2),
  uri = c('S1A_IW_GRDH_1SDV_20190818T175529_20190818T175554_028627_033D25_22ED.zip', 
    'S2A_MSIL1C_20190823T061631_N0208_R034_T42TXS_20190823T081730.zip'),
  plugin = c('class org.esa.s1tbx.io.sentinel1.Sentinel1ProductReaderPlugIn', 
    'class org.esa.s2tbx.dataio.s2.ortho.plugins.Sentinel2L1CProduct_Multi_UTM42N_ReaderPlugIn'),
  stringsAsFactors = FALSE)

我们首先创建XML文档,其中包含所有的XML结构。
doc <- xml_new_root("session") 
xml_add_child(doc, "modelVersion", "1.0.0")  
xml_add_child(doc, "products") 
xml_add_child(doc, "products") 
xml_add_child(doc, "views")
doc
#> {xml_document}
#> <session>
#> [1] <modelVersion>1.0.0</modelVersion>
#> [2] <products/>
#> [3] <products/>
#> [4] <views/>

我们现在向每个产品节点添加组件。

xml_add_child函数是向量化的,因此不需要循环。

products_nodes <- xml_find_all(doc, "//products")
xml_add_child(products_nodes, "refNo", df$number)
xml_add_child(products_nodes, "uri", df$uri)
xml_add_child(products_nodes, "productReaderPlugin", df$plugin)

将xml树保存到文件中,并显示其内容。

write_xml(doc, file = "output.xml", options =c("format", "no_declaration"))
cat(paste0(readLines("output.xml"), collapse = "\n"))

这是“output.xml”文件的内容:

<session>
  <modelVersion>1.0.0</modelVersion>
  <products>
    <refNo>1</refNo>
    <uri>S1A_IW_GRDH_1SDV_20190818T175529_20190818T175554_028627_033D25_22ED.zip</uri>
    <productReaderPlugin>class org.esa.s1tbx.io.sentinel1.Sentinel1ProductReaderPlugIn</productReaderPlugin>
  </products>
  <products>
    <refNo>2</refNo>
    <uri>S2A_MSIL1C_20190823T061631_N0208_R034_T42TXS_20190823T081730.zip</uri>
    <productReaderPlugin>class org.esa.s2tbx.dataio.s2.ortho.plugins.Sentinel2L1CProduct_Multi_UTM42N_ReaderPlugIn</productReaderPlugin>
  </products>
  <views/>
</session>

本示例由 reprex包(v0.3.0)于2021年5月6日创建。


6
考虑使用DOM方法结合上述库(例如XML),构建XML,而无需连接或插入字符串:
library(XML)

# DATA
df <- data.frame(refNo = c(1, 2),
                 uri = c('S1A_IW_GRDH_1SDV_20190818T175529_20190818T175554_028627_033D25_22ED.zip', 
                         'S2A_MSIL1C_20190823T061631_N0208_R034_T42TXS_20190823T081730.zip'),
                 plugin = c('class org.esa.s1tbx.io.sentinel1.Sentinel1ProductReaderPlugIn', 
                            'class org.esa.s2tbx.dataio.s2.ortho.plugins.Sentinel2L1CProduct_Multi_UTM42N_ReaderPlugIn')
                )

# CREATE XML FILE
doc = newXMLDoc()
root = newXMLNode("session", doc = doc)

# WRITE XML NODES AND DATA
mvNode = newXMLNode("modelVersion", "1.0.0", parent = root)

for (i in 1:nrow(df)){
  prodNode = newXMLNode("products", parent = root)

  # APPEND TO PRODUCT NODE
  newXMLNode("refNo", df$refNo[i], parent = prodNode)
  newXMLNode("uri", df$uri[i], parent = prodNode)
  newXMLNode("productReaderPlugin", df$plugin[i], parent = prodNode)
}

vwNode = newXMLNode("views", parent = root)

# OUTPUT XML CONTENT TO CONSOLE
print(doc)

# OUTPUT XML CONTENT TO FILE
saveXML(doc, file="Output.xml")

输出

<?xml version="1.0"?>
<session>
  <modelVersion>1.0.0</modelVersion>
  <products>
    <refNo>1</refNo>
    <uri>S1A_IW_GRDH_1SDV_20190818T175529_20190818T175554_028627_033D25_22ED.zip</uri>
    <productReaderPlugin>class org.esa.s1tbx.io.sentinel1.Sentinel1ProductReaderPlugIn</productReaderPlugin>
  </products>
  <products>
    <refNo>2</refNo>
    <uri>S2A_MSIL1C_20190823T061631_N0208_R034_T42TXS_20190823T081730.zip</uri>
    <productReaderPlugin>class org.esa.s2tbx.dataio.s2.ortho.plugins.Sentinel2L1CProduct_Multi_UTM42N_ReaderPlugIn</productReaderPlugin>
  </products>
  <views/>
</session>

1
如果您的结构相对静态,可能可以轻松解决问题,我会使用https://github.com/tidyverse/glue,然后只需cat()文件即可。类似这样:


## I guess your data looks like this?
df <- data.frame(number = c(1,2),
                 uri = c("S1A_IW_GRDH_1SDV_20190818T175529_20190818T175554_028627_033D25_22ED.zip<",
                         "S2A_MSIL1C_20190823T061631_N0208_R034_T42TXS_20190823T081730.zip"),
                 plugin = c("class org.esa.s1tbx.io.sentinel1.Sentinel1ProductReaderPlugIn",
                            "class org.esa.s2tbx.dataio.s2.ortho.plugins.Sentinel2L1CProduct_Multi_UTM42N_ReaderPlugIn"))
df

## build a function that outputs every block in xml format
thingieBuilder <- function(number, uri, plugin){
  glue::glue("<product>
           <refNo>{number}</refNo>
           <uri>{uri}</uri>
           <productReaderPlugin>{plugin}</productReaderPlugin>
           </product>")
}

## now run that for each entry in your df and unlist it, and make it a sausage, seperated by newlines
xmlProducts <- df %>% purrr::pmap(thingieBuilder) %>% unlist %>% paste(collapse = "\n")

## Now stick on top and bottom, and cat it to a file!
glue::glue("<session>
  <modelVersion>1.0.0</modelVersion>
  <products>\n",
           xmlProducts,
           "/n</products>
             <views/>
           </session>") %>% 
  cat(file = "boom.xml")

谢谢!我在寻找pmap函数时遇到了问题。您能否列出所使用的包。glue + ...? - GCGM
XML并不完全是一个文本文件。OP之前提到的方法可以使用DOM方法构建XML。 - Parfait
抱歉... pmap 函数在 purrr 包中。我已相应地修改了我的答案。 - Amit Kohli

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接