使用PowerShell解析带有命名空间的XML

4

我希望你可以协助我理解PowerShell中的XML。我有几个像这样的XML文件:

<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="http://www.example.com/xml/catalog/2006-10-31">
    <product product-id="11210">
        ...
        <available-flag>true</available-flag>
        <online-flag>false</online-flag>
        <online-flag site-id="ru">true</online-flag>
        <online-flag site-id="fr">true</online-flag>
        <online-flag site-id="uk">false</online-flag>
        <online-flag site-id="de">true</online-flag>
        ...
    </product>
    <product product-id="50610">
        ...
        <available-flag>true</available-flag>
        <online-flag>true</online-flag>
        <online-flag site-id="ru">false</online-flag>
        <online-flag site-id="fr">true</online-flag>
        <online-flag site-id="uk">false</online-flag>
        <online-flag site-id="de">fasle</online-flag>
        ...
    </product>
    <product product-id="82929">
        ...
        <available-flag>true</available-flag>
        <online-flag>true</online-flag>
        <online-flag site-id="ru">false</online-flag>
        <online-flag site-id="fr">true</online-flag>
        <online-flag site-id="uk">false</online-flag>
        <online-flag site-id="de">true</online-flag>
        ...
    </product>
</catalog>

我需要在PowerShell中获取两个元素的值:
  • <online-flag>(不带site-id属性)
  • <online-flag site-id="ru">

关于 product-id="50610" 的产品。

下面是我已有的代码:
$Path = "C:\Temp\0\2017-08-12_190211.xml"
$XPath = "/ns:catalog/ns:product[@product-id='50610']"

$files = Get-ChildItem $Path | Where {-not $_.PSIsContainer}

if ($files -eq $null) {
    return
}

foreach ($file in $files) {
    [xml]$xml = Get-Content $file
    $namespace = $xml.DocumentElement.NamespaceURI
    $ns = New-Object System.Xml.XmlNamespaceManager($xml.NameTable)
    $ns.AddNamespace("ns", $namespace)
    $product = $xml.SelectSingleNode($XPath, $ns)
}

有几个问题:

  1. With this code I am able to select the needed product node. PowerShell shows:

    online-flag        : {true, online-flag, online-flag, online-flag...}
    

    But how then I can select the values of the needed online-flag elements (if it is possible both ways: XPath one and the object one)?

  2. Is it possible to select a node in the "object" way? Like this:

    $product = $xml.catalog.product |
               Where-Object {$_."product-id".value -eq "50610"}
    
  3. If I have several files, what is the best way to select filename, global online-flag (without attributes), specific online-flag?

3个回答

3

使用两个不同的XPath表达式:

  1. for selecting a node without a particular attribute:

    //ns:product[@product-id='50610']/ns:online-flag[not(@site-id)]
    
  2. for selecting a node with a particular attribute value:

    //ns:product[@product-id='50610']/ns:online-flag[@site-id='ru']
    

您可以通过将XPath表达式相对于当前节点(.)来选择与已选节点相关的节点:

$XPath = "/ns:catalog/ns:product[@product-id='50610']"
...
$product = $xml.SelectSingleNode($XPath, $ns)
$product.SelectSingleNode("./ns:online-flag[not(@site-id)]", $ns)
$product.SelectSingleNode("./ns:online-flag[@site-id='ru']", $ns)

如果你需要包含文件名和两个节点值的结果数据,我建议构建自定义对象:
$files | ForEach-Object {
    [xml]$xml = Get-Content $_
    ...
    New-Object -Type PSObject -Property @{
        'Filename'  = $_
        'online'    = $product.SelectSingleNode("./ns:online-flag[not(@site-id)]", $ns).'#text'
        'ru_online' = $product.SelectSingleNode("./ns:online-flag[@site-id='ru']", $ns).'#text'
    }
}

使用点表示法并通过Where-Object进行过滤是可能的,但我不推荐。我发现XPath更高效。

你好Ansgar!感谢你的回答。我已经提到了点符号表示法是有效的,我也同意它不太方便。你的例子存在的问题是我的XML文件非常大,选择两个节点需要时间。是否可以先像我的示例一样选择一个产品,然后再使用XPath选择在线标志元素的值?在这种情况下XPath会是什么? - Alterant
我尝试了所有以下方法,但都没有成功: $product.SelectSingleNode("/ns:product/ns:online-flag[@site-id='ru']", $ns), $product.SelectSingleNode("/ns:online-flag[@site-id='ru']", $ns), $product.SelectSingleNode("/product/online-flag[@site-id='ru']"), $product.SelectSingleNode("/online-flag[@site-id='ru']")。 这个 $product.GetElementsByTagName("online-flag") 是可以运行的。但是结果不是单一的值,而是一个值列表。 - Alterant
在这里找到答案:https://dev59.com/7XE95IYBdhLWcg3wmvGh - Alterant
需要使用 $product.SelectSingleNode("ns:online-flag[not(@site-id)]", $ns) 或 $product.SelectSingleNode("./ns:online-flag[not(@site-id)]", $ns) 在当前节点中进行搜索。非常感谢! - Alterant
我认为命名空间是<xml><ns:node>val</ns:node>,但我在xmlfile中找不到它。 - Timo

1
我能够通过“对象”方式获取所需数据:
$product = $xml.catalog.product | Where-Object {$_."product-id" -eq "50610"}
$of = $product."online-flag"
$glblsid = $of | Where-Object {$_ -is [System.String]}
$specsid = ($of | Where-Object {$_."site-id" -eq "ru"})."#text"

但我不喜欢我做这件事的方式。有更方便的解决方案吗?

对于第二个问题的答案是肯定的 - 参见第一行。


1
为了完成这个主题,我测量了3种方法的性能:点样式、文件上的XPath和节点上的XPath。它们之间没有明显的区别。以下是详细信息。
我解析了2次每个60MB的2个文件。
  1. Object style (dot style)

    ...
    $StartTime = Get-Date
    foreach ($file in $files) {
        [xml]$xml = Get-Content $file
    
        #Object style
        $product = $xml.catalog.product | Where-Object {$_."product-id" -eq "50610"}
        $of = $product."online-flag"
        $glblsid = $of | Where-Object {$_ -is [System.String]}
        $specsid = ($of | Where-Object {$_."site-id" -eq "ru"})."#text"
        Write-Output "$($file.Name) $glblsid $specsid"
    }
    $EndTime = Get-Date
    $TimeSpan = New-TimeSpan -Start $StartTime -End $EndTime
    Write-Output $TimeSpan.TotalMilliseconds
    

    Results:

    PS> .\ParseXML2.ps1
    2017-08-10_190159.xml false false
    2017-08-11_190203.xml false true
    36269,535
    PS> .\ParseXML2.ps1
    2017-08-10_190159.xml false false
    2017-08-11_190203.xml false true
    36628,3304
    
  2. XPath on the file:

    ...
    $StartTime = Get-Date
    foreach ($file in $files) {
        [xml]$xml = Get-Content $file
    
        #XPath on the file
        $namespace = $xml.DocumentElement.NamespaceURI
        $ns = New-Object System.Xml.XmlNamespaceManager($xml.NameTable)
        $ns.AddNamespace("ns", $namespace)
        $glblsid = $xml.SelectSingleNode("/ns:catalog/ns:product[@product-id='50610']/ns:online-flag[not(@site-id)]", $ns).'#text'
        $specsid = $xml.SelectSingleNode("/ns:catalog/ns:product[@product-id='50610']/ns:online-flag[@site-id='ru']", $ns).'#text'
        Write-Output "$($file.Name) $glblsid $specsid"
    }
    $EndTime = Get-Date
    $TimeSpan = New-TimeSpan -Start $StartTime -End $EndTime
    Write-Output $TimeSpan.TotalMilliseconds
    

    Results:

    PS> .\ParseXML2.ps1
    2017-08-10_190159.xml false false
    2017-08-11_190203.xml false true
    36129,1368
    PS> .\ParseXML2.ps1
    2017-08-10_190159.xml false false
    2017-08-11_190203.xml false true
    38890,3014
    
  3. XPath on the node:

    ...
    $StartTime = Get-Date
    foreach ($file in $files) {
        [xml]$xml = Get-Content $file
    
        #XPath on the node
        $namespace = $xml.DocumentElement.NamespaceURI
        $ns = New-Object System.Xml.XmlNamespaceManager($xml.NameTable)
        $ns.AddNamespace("ns", $namespace)
        $product = $xml.SelectSingleNode("/ns:catalog/ns:product[@product-id='50610']", $ns)
        $glblsid = $product.SelectSingleNode("ns:online-flag[not(@site-id)]", $ns).'#text'
        $specsid = $product.SelectSingleNode("ns:online-flag[@site-id='ru']", $ns).'#text'
        Write-Output "$($file.Name) $glblsid $specsid"
    }
    $EndTime = Get-Date
    $TimeSpan = New-TimeSpan -Start $StartTime -End $EndTime
    Write-Output $TimeSpan.TotalMilliseconds
    

    Results:

    PS> .\ParseXML2.ps1
    2017-08-10_190159.xml false false
    2017-08-11_190203.xml false true
    33477,1708
    PS> .\ParseXML2.ps1
    2017-08-10_190159.xml false false
    2017-08-11_190203.xml false true
    34116,7626
    

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接