如何从XmlNode实例获取xpath

56

有人可以提供一些代码,以获取System.Xml.XmlNode实例的xpath吗?

谢谢!


请澄清一下,您的意思是指从根节点到该节点的节点名称列表,用 / 分隔开吗? - Chris Marasti-Georg
没错。就像这样... "root/mycars/toyota/description/paragraph"描述元素中可能有多个段落。但我只想让XPath指向XmlNode实例所指的那一个。 - joe
3
人们不应该只是“要求代码” - 他们应该提供他们至少尝试过的一些代码。 - bgmCoder
14个回答

61

好的,我不能抵制尝试它的诱惑。它只适用于属性和元素,但是嘿...在15分钟内你能期望什么呢 :) 同样可能有更简洁的方法来做到这一点。

在每个元素上包含索引是多余的(特别是根元素!),但这比尝试确定是否存在歧义要容易。

using System;
using System.Text;
using System.Xml;

class Test
{
    static void Main()
    {
        string xml = @"
<root>
  <foo />
  <foo>
     <bar attr='value'/>
     <bar other='va' />
  </foo>
  <foo><bar /></foo>
</root>";
        XmlDocument doc = new XmlDocument();
        doc.LoadXml(xml);
        XmlNode node = doc.SelectSingleNode("//@attr");
        Console.WriteLine(FindXPath(node));
        Console.WriteLine(doc.SelectSingleNode(FindXPath(node)) == node);
    }

    static string FindXPath(XmlNode node)
    {
        StringBuilder builder = new StringBuilder();
        while (node != null)
        {
            switch (node.NodeType)
            {
                case XmlNodeType.Attribute:
                    builder.Insert(0, "/@" + node.Name);
                    node = ((XmlAttribute) node).OwnerElement;
                    break;
                case XmlNodeType.Element:
                    int index = FindElementIndex((XmlElement) node);
                    builder.Insert(0, "/" + node.Name + "[" + index + "]");
                    node = node.ParentNode;
                    break;
                case XmlNodeType.Document:
                    return builder.ToString();
                default:
                    throw new ArgumentException("Only elements and attributes are supported");
            }
        }
        throw new ArgumentException("Node was not in a document");
    }

    static int FindElementIndex(XmlElement element)
    {
        XmlNode parentNode = element.ParentNode;
        if (parentNode is XmlDocument)
        {
            return 1;
        }
        XmlElement parent = (XmlElement) parentNode;
        int index = 1;
        foreach (XmlNode candidate in parent.ChildNodes)
        {
            if (candidate is XmlElement && candidate.Name == element.Name)
            {
                if (candidate == element)
                {
                    return index;
                }
                index++;
            }
        }
        throw new ArgumentException("Couldn't find element within parent");
    }
}

4
谢谢,Jon。我最近用到了这个功能。当一个元素之前有相同类型的“侄子”元素时,FindElementIndex会出现错误。我会进行轻微修订以解决这个问题。 - harpo
非常感谢Jon!今天你救了我的命!我有一个源xml/xsd树(复选框树,因此用户可以删除节点),我将用户的选择保存为逗号分隔的xpath字符串,以便稍后过滤用户的XML提要,使他们只获取所需的节点子集。这对我很有用。再次感谢。 - Laguna

25

Jon是正确的,XPath表达式有很多种方式可以在实例文档中返回相同的节点。构建一个能够明确返回特定节点的表达式的最简单方法是使用节点位置来确定节点测试的链式结构,例如:

/node()[0]/node()[2]/node()[6]/node()[1]/node()[2]

显然,这个表达式没有使用元素名称,但是如果你只是想在文档中定位一个节点,那么你不需要它的名称。它也不能用来查找属性(因为属性不是节点,没有位置;你只能通过名称找到它们),但它会找到所有其他节点类型。

要构建这个表达式,你需要编写一个方法,返回一个节点在其父节点的子节点中的位置,因为XmlNode不将其作为一个属性公开:

static int GetNodePosition(XmlNode child)
{
   for (int i=0; i<child.ParentNode.ChildNodes.Count; i++)
   {
       if (child.ParentNode.ChildNodes[i] == child)
       {
          // tricksy XPath, not starting its positions at 0 like a normal language
          return i + 1;
       }
   }
   throw new InvalidOperationException("Child node somehow not found in its parent's ChildNodes property.");
}

(使用 LINQ 可能有更优雅的方法,因为 XmlNodeList 实现了 IEnumerable,但我会使用我知道的方法。)

然后,您可以编写这样一个递归方法:

static string GetXPathToNode(XmlNode node)
{
    if (node.NodeType == XmlNodeType.Attribute)
    {
        // attributes have an OwnerElement, not a ParentNode; also they have
        // to be matched by name, not found by position
        return String.Format(
            "{0}/@{1}",
            GetXPathToNode(((XmlAttribute)node).OwnerElement),
            node.Name
            );            
    }
    if (node.ParentNode == null)
    {
        // the only node with no parent is the root node, which has no path
        return "";
    }
    // the path to a node is the path to its parent, plus "/node()[n]", where 
    // n is its position among its siblings.
    return String.Format(
        "{0}/node()[{1}]",
        GetXPathToNode(node.ParentNode),
        GetNodePosition(node)
        );
}

如您所见,我已经以某种方式对其进行了修改,使其能够查找属性。

当我写我的版本时,Jon悄悄地加入了他的版本。他的代码中有一些东西会让我发表一些感想,如果听起来好像我在抨击Jon,那我提前向您道歉。(我没在抨击Jon。我敢肯定,通过我去指导Jon学习的东西将极为简单)但是我认为我要说的问题对于任何与XML打交道的人来说都非常重要。

我怀疑Jon的解决方案来源于我看到很多开发人员所做的事情:将XML文档视为元素和属性的树。我认为这主要源自于将XML作为序列化格式使用的开发人员,因为他们习惯使用的所有XML都是以此结构为基础的。你可以通过他们使用"node"和"element"这些术语互换使用来识别这些开发者。这导致他们提出的解决方案将所有其他节点类型视为特殊情况。(在很长时间内,我也是这些人之一)

这在你想的时候感觉像是一个简化的假设。但它并不是。这会让问题更加复杂,代码更加复杂。它会导致你绕过XML技术的某些部分(比如XPath中的node()函数),这些部分专门设计用于通用处理所有节点类型。

Jon的代码中有一个红旗,即使我不知道要求是什么,这也会让我在代码审查中询问它,那就是GetElementsByTagName方法。每当我看到使用该方法时,脑海中跳出的问题总是"为什么它必须是一个元素?"答案往往是"哦,这段代码是否需要处理文本节点?"


7

我知道这是一篇老帖子,但我最喜欢的版本(有名称的那个)存在缺陷: 当一个父节点拥有不同名称的节点时,它在找到第一个不匹配的节点名称后就停止计数索引。

这是我修复后的版本:

/// <summary>
/// Gets the X-Path to a given Node
/// </summary>
/// <param name="node">The Node to get the X-Path from</param>
/// <returns>The X-Path of the Node</returns>
public string GetXPathToNode(XmlNode node)
{
    if (node.NodeType == XmlNodeType.Attribute)
    {
        // attributes have an OwnerElement, not a ParentNode; also they have             
        // to be matched by name, not found by position             
        return String.Format("{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name);
    }
    if (node.ParentNode == null)
    {
        // the only node with no parent is the root node, which has no path
        return "";
    }

    // Get the Index
    int indexInParent = 1;
    XmlNode siblingNode = node.PreviousSibling;
    // Loop thru all Siblings
    while (siblingNode != null)
    {
        // Increase the Index if the Sibling has the same Name
        if (siblingNode.Name == node.Name)
        {
            indexInParent++;
        }
        siblingNode = siblingNode.PreviousSibling;
    }

    // the path to a node is the path to its parent, plus "/node()[n]", where n is its position among its siblings.         
    return String.Format("{0}/{1}[{2}]", GetXPathToNode(node.ParentNode), node.Name, indexInParent);
}

4

这是我用过并且有效的简单方法。

    static string GetXpath(XmlNode node)
    {
        if (node.Name == "#document")
            return String.Empty;
        return GetXpath(node.SelectSingleNode("..")) + "/" +  (node.NodeType == XmlNodeType.Attribute ? "@":String.Empty) + node.Name;
    }

3

我的看法是罗伯特和科里的答案的混合。我只能为额外代码的实际输入声称功劳。

    private static string GetXPathToNode(XmlNode node)
    {
        if (node.NodeType == XmlNodeType.Attribute)
        {
            // attributes have an OwnerElement, not a ParentNode; also they have
            // to be matched by name, not found by position
            return String.Format(
                "{0}/@{1}",
                GetXPathToNode(((XmlAttribute)node).OwnerElement),
                node.Name
                );
        }
        if (node.ParentNode == null)
        {
            // the only node with no parent is the root node, which has no path
            return "";
        }
        //get the index
        int iIndex = 1;
        XmlNode xnIndex = node;
        while (xnIndex.PreviousSibling != null) { iIndex++; xnIndex = xnIndex.PreviousSibling; }
        // the path to a node is the path to its parent, plus "/node()[n]", where 
        // n is its position among its siblings.
        return String.Format(
            "{0}/node()[{1}]",
            GetXPathToNode(node.ParentNode),
            iIndex
            );
    }

2

如果您这样做,您将获得一个具有节点名称和位置的路径,如果您有像这样相同名称的节点:

"/Service[1]/System[1]/Group[1]/Folder[2]/File[2]"

public string GetXPathToNode(XmlNode node)
{         
    if (node.NodeType == XmlNodeType.Attribute)
    {             
        // attributes have an OwnerElement, not a ParentNode; also they have             
        // to be matched by name, not found by position             
        return String.Format("{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name);
    }
    if (node.ParentNode == null)
    {             
        // the only node with no parent is the root node, which has no path
        return "";
    }

    //get the index
    int iIndex = 1;
    XmlNode xnIndex = node;
    while (xnIndex.PreviousSibling != null && xnIndex.PreviousSibling.Name == xnIndex.Name)
    {
         iIndex++;
         xnIndex = xnIndex.PreviousSibling; 
    }

    // the path to a node is the path to its parent, plus "/node()[n]", where
    // n is its position among its siblings.         
    return String.Format("{0}/{1}[{2}]", GetXPathToNode(node.ParentNode), node.Name, iIndex);
}

2

一个节点没有所谓的“the” xpath。对于任何给定的节点,可能有很多xpath表达式可以匹配它。

你可以向上遍历树来构建一个表达式,考虑特定元素的索引等因素,但这不会是非常好的代码。

为什么需要这个?可能有更好的解决方案。


我正在调用一个API来编辑XML的应用程序。我需要告诉应用程序隐藏某些节点,我通过调用ToggleVisibleElement来实现,它接受一个xpath参数。我希望有一种简单的方法来做到这一点。 - joe
@Jon Skeet:请看我对类似问题的回答:https://dev59.com/vHRB5IYBdhLWcg3w9b59我的解决方案生成了一个XPath表达式,可选择可能是任何类型的节点:根节点,元素,属性,文本,注释,PI或命名空间。 - Dimitre Novatchev
验证XML文档的内容是一个很好的理由,特别是当你要报告语义错误,而这些错误超出了根据模式进行验证所能确定的范围。XPath相当易读,但也可以提供给自动化系统使用,以突出显示问题节点并将其呈现给能够更正文档的人。 - Suncat2000

1

使用类扩展怎么样? ;) 我的版本(基于其他人的工作)使用语法名称[index]...,如果元素没有“兄弟”,则省略索引。 获取元素索引的循环在外部独立的例程中(也是类扩展)。

只需将以下内容粘贴到任何实用程序类(或主程序类)中即可

static public int GetRank( this XmlNode node )
{
    // return 0 if unique, else return position 1...n in siblings with same name
    try
    {
        if( node is XmlElement ) 
        {
            int rank = 1;
            bool alone = true, found = false;

            foreach( XmlNode n in node.ParentNode.ChildNodes )
                if( n.Name == node.Name ) // sibling with same name
                {
                    if( n.Equals(node) )
                    {
                        if( ! alone ) return rank; // no need to continue
                        found = true;
                    }
                    else
                    {
                        if( found ) return rank; // no need to continue
                        alone = false;
                        rank++;
                    }
                }

        }
    }
    catch{}
    return 0;
}

static public string GetXPath( this XmlNode node )
{
    try
    {
        if( node is XmlAttribute )
            return String.Format( "{0}/@{1}", (node as XmlAttribute).OwnerElement.GetXPath(), node.Name );

        if( node is XmlText || node is XmlCDataSection )
            return node.ParentNode.GetXPath();

        if( node.ParentNode == null )   // the only node with no parent is the root node, which has no path
            return "";

        int rank = node.GetRank();
        if( rank == 0 ) return String.Format( "{0}/{1}",        node.ParentNode.GetXPath(), node.Name );
        else            return String.Format( "{0}/{1}[{2}]",   node.ParentNode.GetXPath(), node.Name, rank );
    }
    catch{}
    return "";
}   

1
我为Excel编写了VBA代码来完成这个工作项目。它输出元素或属性的Xpath和相关文本的元组。目的是让业务分析师能够识别和映射一些xml。请注意,这是一个C#论坛,但我认为这可能会引起您的兴趣。
Sub Parse2(oSh As Long, inode As IXMLDOMNode, Optional iXstring As String = "", Optional indexes)


Dim chnode As IXMLDOMNode
Dim attr As IXMLDOMAttribute
Dim oXString As String
Dim chld As Long
Dim idx As Variant
Dim addindex As Boolean
chld = 0
idx = 0
addindex = False


'determine the node type:
Select Case inode.NodeType

    Case NODE_ELEMENT
        If inode.ParentNode.NodeType = NODE_DOCUMENT Then 'This gets the root node name but ignores all the namespace attributes
            oXString = iXstring & "//" & fp(inode.nodename)
        Else

            'Need to deal with indexing. Where an element has siblings with the same nodeName,it needs to be indexed using [index], e.g swapstreams or schedules

            For Each chnode In inode.ParentNode.ChildNodes
                If chnode.NodeType = NODE_ELEMENT And chnode.nodename = inode.nodename Then chld = chld + 1
            Next chnode

            If chld > 1 Then '//inode has siblings of the same nodeName, so needs to be indexed
                'Lookup the index from the indexes array
                idx = getIndex(inode.nodename, indexes)
                addindex = True
            Else
            End If

            'build the XString
            oXString = iXstring & "/" & fp(inode.nodename)
            If addindex Then oXString = oXString & "[" & idx & "]"

            'If type is element then check for attributes
            For Each attr In inode.Attributes
                'If the element has attributes then extract the data pair XString + Element.Name, @Attribute.Name=Attribute.Value
                Call oSheet(oSh, oXString & "/@" & attr.Name, attr.Value)
            Next attr

        End If

    Case NODE_TEXT
        'build the XString
        oXString = iXstring
        Call oSheet(oSh, oXString, inode.NodeValue)

    Case NODE_ATTRIBUTE
    'Do nothing
    Case NODE_CDATA_SECTION
    'Do nothing
    Case NODE_COMMENT
    'Do nothing
    Case NODE_DOCUMENT
    'Do nothing
    Case NODE_DOCUMENT_FRAGMENT
    'Do nothing
    Case NODE_DOCUMENT_TYPE
    'Do nothing
    Case NODE_ENTITY
    'Do nothing
    Case NODE_ENTITY_REFERENCE
    'Do nothing
    Case NODE_INVALID
    'do nothing
    Case NODE_NOTATION
    'do nothing
    Case NODE_PROCESSING_INSTRUCTION
    'do nothing
End Select

'Now call Parser2 on each of inode's children.
If inode.HasChildNodes Then
    For Each chnode In inode.ChildNodes
        Call Parse2(oSh, chnode, oXString, indexes)
    Next chnode
Set chnode = Nothing
Else
End If

End Sub

管理元素计数,使用以下内容:
Function getIndex(tag As Variant, indexes) As Variant
'Function to get the latest index for an xml tag from the indexes array
'indexes array is passed from one parser function to the next up and down the tree

Dim i As Integer
Dim n As Integer

If IsArrayEmpty(indexes) Then
    ReDim indexes(1, 0)
    indexes(0, 0) = "Tag"
    indexes(1, 0) = "Index"
Else
End If
For i = 0 To UBound(indexes, 2)
    If indexes(0, i) = tag Then
        'tag found, increment and return the index then exit
        'also destroy all recorded tag names BELOW that level
        indexes(1, i) = indexes(1, i) + 1
        getIndex = indexes(1, i)
        ReDim Preserve indexes(1, i) 'should keep all tags up to i but remove all below it
        Exit Function
    Else
    End If
Next i

'tag not found so add the tag with index 1 at the end of the array
n = UBound(indexes, 2)
ReDim Preserve indexes(1, n + 1)
indexes(0, n + 1) = tag
indexes(1, n + 1) = 1
getIndex = 1

End Function

1
我发现以上的代码都不能与 XDocument 一起使用,所以我编写了自己的代码来支持 XDocument 并使用递归。我认为这段代码比其他代码更好地处理了多个相同节点,因为它首先尝试尽可能深入 XML 路径,然后再返回构建所需的内容。因此,如果您有 /home/white/bob/home/white/mike,并且想要创建 /home/white/bob/garage,该代码将知道如何创建它。但是,我不想搞乱谓词或通配符,所以我明确禁止了它们;但是添加对它们的支持很容易。
Private Sub NodeItterate(XDoc As XElement, XPath As String)
    'get the deepest path
    Dim nodes As IEnumerable(Of XElement)

    nodes = XDoc.XPathSelectElements(XPath)

    'if it doesn't exist, try the next shallow path
    If nodes.Count = 0 Then
        NodeItterate(XDoc, XPath.Substring(0, XPath.LastIndexOf("/")))
        'by this time all the required parent elements will have been constructed
        Dim ParentPath As String = XPath.Substring(0, XPath.LastIndexOf("/"))
        Dim ParentNode As XElement = XDoc.XPathSelectElement(ParentPath)
        Dim NewElementName As String = XPath.Substring(XPath.LastIndexOf("/") + 1, XPath.Length - XPath.LastIndexOf("/") - 1)
        ParentNode.Add(New XElement(NewElementName))
    End If

    'if we find there are more than 1 elements at the deepest path we have access to, we can't proceed
    If nodes.Count > 1 Then
        Throw New ArgumentOutOfRangeException("There are too many paths that match your expression.")
    End If

    'if there is just one element, we can proceed
    If nodes.Count = 1 Then
        'just proceed
    End If

End Sub

Public Sub CreateXPath(ByVal XDoc As XElement, ByVal XPath As String)

    If XPath.Contains("//") Or XPath.Contains("*") Or XPath.Contains(".") Then
        Throw New ArgumentException("Can't create a path based on searches, wildcards, or relative paths.")
    End If

    If Regex.IsMatch(XPath, "\[\]()@='<>\|") Then
        Throw New ArgumentException("Can't create a path based on predicates.")
    End If

    'we will process this recursively.
    NodeItterate(XDoc, XPath)

End Sub

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接