有人可以提供一些代码,以获取System.Xml.XmlNode实例的xpath吗?
谢谢!
好的,我不能抵制尝试它的诱惑。它只适用于属性和元素,但是嘿...在15分钟内你能期望什么呢 :) 同样可能有更简洁的方法来做到这一点。
在每个元素上包含索引是多余的(特别是根元素!),但这比尝试确定是否存在歧义要容易。
using System;
using System.Text;
using System.Xml;
class Test
{
static void Main()
{
string xml = @"
<root>
<foo />
<foo>
<bar attr='value'/>
<bar other='va' />
</foo>
<foo><bar /></foo>
</root>";
XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
XmlNode node = doc.SelectSingleNode("//@attr");
Console.WriteLine(FindXPath(node));
Console.WriteLine(doc.SelectSingleNode(FindXPath(node)) == node);
}
static string FindXPath(XmlNode node)
{
StringBuilder builder = new StringBuilder();
while (node != null)
{
switch (node.NodeType)
{
case XmlNodeType.Attribute:
builder.Insert(0, "/@" + node.Name);
node = ((XmlAttribute) node).OwnerElement;
break;
case XmlNodeType.Element:
int index = FindElementIndex((XmlElement) node);
builder.Insert(0, "/" + node.Name + "[" + index + "]");
node = node.ParentNode;
break;
case XmlNodeType.Document:
return builder.ToString();
default:
throw new ArgumentException("Only elements and attributes are supported");
}
}
throw new ArgumentException("Node was not in a document");
}
static int FindElementIndex(XmlElement element)
{
XmlNode parentNode = element.ParentNode;
if (parentNode is XmlDocument)
{
return 1;
}
XmlElement parent = (XmlElement) parentNode;
int index = 1;
foreach (XmlNode candidate in parent.ChildNodes)
{
if (candidate is XmlElement && candidate.Name == element.Name)
{
if (candidate == element)
{
return index;
}
index++;
}
}
throw new ArgumentException("Couldn't find element within parent");
}
}
Jon是正确的,XPath表达式有很多种方式可以在实例文档中返回相同的节点。构建一个能够明确返回特定节点的表达式的最简单方法是使用节点位置来确定节点测试的链式结构,例如:
/node()[0]/node()[2]/node()[6]/node()[1]/node()[2]
显然,这个表达式没有使用元素名称,但是如果你只是想在文档中定位一个节点,那么你不需要它的名称。它也不能用来查找属性(因为属性不是节点,没有位置;你只能通过名称找到它们),但它会找到所有其他节点类型。
要构建这个表达式,你需要编写一个方法,返回一个节点在其父节点的子节点中的位置,因为XmlNode
不将其作为一个属性公开:
static int GetNodePosition(XmlNode child)
{
for (int i=0; i<child.ParentNode.ChildNodes.Count; i++)
{
if (child.ParentNode.ChildNodes[i] == child)
{
// tricksy XPath, not starting its positions at 0 like a normal language
return i + 1;
}
}
throw new InvalidOperationException("Child node somehow not found in its parent's ChildNodes property.");
}
(使用 LINQ 可能有更优雅的方法,因为 XmlNodeList 实现了 IEnumerable,但我会使用我知道的方法。)
然后,您可以编写这样一个递归方法:
static string GetXPathToNode(XmlNode node)
{
if (node.NodeType == XmlNodeType.Attribute)
{
// attributes have an OwnerElement, not a ParentNode; also they have
// to be matched by name, not found by position
return String.Format(
"{0}/@{1}",
GetXPathToNode(((XmlAttribute)node).OwnerElement),
node.Name
);
}
if (node.ParentNode == null)
{
// the only node with no parent is the root node, which has no path
return "";
}
// the path to a node is the path to its parent, plus "/node()[n]", where
// n is its position among its siblings.
return String.Format(
"{0}/node()[{1}]",
GetXPathToNode(node.ParentNode),
GetNodePosition(node)
);
}
如您所见,我已经以某种方式对其进行了修改,使其能够查找属性。
当我写我的版本时,Jon悄悄地加入了他的版本。他的代码中有一些东西会让我发表一些感想,如果听起来好像我在抨击Jon,那我提前向您道歉。(我没在抨击Jon。我敢肯定,通过我去指导Jon学习的东西将极为简单)但是我认为我要说的问题对于任何与XML打交道的人来说都非常重要。
我怀疑Jon的解决方案来源于我看到很多开发人员所做的事情:将XML文档视为元素和属性的树。我认为这主要源自于将XML作为序列化格式使用的开发人员,因为他们习惯使用的所有XML都是以此结构为基础的。你可以通过他们使用"node"和"element"这些术语互换使用来识别这些开发者。这导致他们提出的解决方案将所有其他节点类型视为特殊情况。(在很长时间内,我也是这些人之一)
这在你想的时候感觉像是一个简化的假设。但它并不是。这会让问题更加复杂,代码更加复杂。它会导致你绕过XML技术的某些部分(比如XPath中的node()
函数),这些部分专门设计用于通用处理所有节点类型。
Jon的代码中有一个红旗,即使我不知道要求是什么,这也会让我在代码审查中询问它,那就是GetElementsByTagName
方法。每当我看到使用该方法时,脑海中跳出的问题总是"为什么它必须是一个元素?"答案往往是"哦,这段代码是否需要处理文本节点?"
我知道这是一篇老帖子,但我最喜欢的版本(有名称的那个)存在缺陷: 当一个父节点拥有不同名称的节点时,它在找到第一个不匹配的节点名称后就停止计数索引。
这是我修复后的版本:
/// <summary>
/// Gets the X-Path to a given Node
/// </summary>
/// <param name="node">The Node to get the X-Path from</param>
/// <returns>The X-Path of the Node</returns>
public string GetXPathToNode(XmlNode node)
{
if (node.NodeType == XmlNodeType.Attribute)
{
// attributes have an OwnerElement, not a ParentNode; also they have
// to be matched by name, not found by position
return String.Format("{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name);
}
if (node.ParentNode == null)
{
// the only node with no parent is the root node, which has no path
return "";
}
// Get the Index
int indexInParent = 1;
XmlNode siblingNode = node.PreviousSibling;
// Loop thru all Siblings
while (siblingNode != null)
{
// Increase the Index if the Sibling has the same Name
if (siblingNode.Name == node.Name)
{
indexInParent++;
}
siblingNode = siblingNode.PreviousSibling;
}
// the path to a node is the path to its parent, plus "/node()[n]", where n is its position among its siblings.
return String.Format("{0}/{1}[{2}]", GetXPathToNode(node.ParentNode), node.Name, indexInParent);
}
这是我用过并且有效的简单方法。
static string GetXpath(XmlNode node)
{
if (node.Name == "#document")
return String.Empty;
return GetXpath(node.SelectSingleNode("..")) + "/" + (node.NodeType == XmlNodeType.Attribute ? "@":String.Empty) + node.Name;
}
我的看法是罗伯特和科里的答案的混合。我只能为额外代码的实际输入声称功劳。
private static string GetXPathToNode(XmlNode node)
{
if (node.NodeType == XmlNodeType.Attribute)
{
// attributes have an OwnerElement, not a ParentNode; also they have
// to be matched by name, not found by position
return String.Format(
"{0}/@{1}",
GetXPathToNode(((XmlAttribute)node).OwnerElement),
node.Name
);
}
if (node.ParentNode == null)
{
// the only node with no parent is the root node, which has no path
return "";
}
//get the index
int iIndex = 1;
XmlNode xnIndex = node;
while (xnIndex.PreviousSibling != null) { iIndex++; xnIndex = xnIndex.PreviousSibling; }
// the path to a node is the path to its parent, plus "/node()[n]", where
// n is its position among its siblings.
return String.Format(
"{0}/node()[{1}]",
GetXPathToNode(node.ParentNode),
iIndex
);
}
如果您这样做,您将获得一个具有节点名称和位置的路径,如果您有像这样相同名称的节点:
"/Service[1]/System[1]/Group[1]/Folder[2]/File[2]"public string GetXPathToNode(XmlNode node)
{
if (node.NodeType == XmlNodeType.Attribute)
{
// attributes have an OwnerElement, not a ParentNode; also they have
// to be matched by name, not found by position
return String.Format("{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name);
}
if (node.ParentNode == null)
{
// the only node with no parent is the root node, which has no path
return "";
}
//get the index
int iIndex = 1;
XmlNode xnIndex = node;
while (xnIndex.PreviousSibling != null && xnIndex.PreviousSibling.Name == xnIndex.Name)
{
iIndex++;
xnIndex = xnIndex.PreviousSibling;
}
// the path to a node is the path to its parent, plus "/node()[n]", where
// n is its position among its siblings.
return String.Format("{0}/{1}[{2}]", GetXPathToNode(node.ParentNode), node.Name, iIndex);
}
一个节点没有所谓的“the” xpath。对于任何给定的节点,可能有很多xpath表达式可以匹配它。
你可以向上遍历树来构建一个表达式,考虑特定元素的索引等因素,但这不会是非常好的代码。
为什么需要这个?可能有更好的解决方案。
使用类扩展怎么样? ;) 我的版本(基于其他人的工作)使用语法名称[index]...,如果元素没有“兄弟”,则省略索引。 获取元素索引的循环在外部独立的例程中(也是类扩展)。
只需将以下内容粘贴到任何实用程序类(或主程序类)中即可
static public int GetRank( this XmlNode node )
{
// return 0 if unique, else return position 1...n in siblings with same name
try
{
if( node is XmlElement )
{
int rank = 1;
bool alone = true, found = false;
foreach( XmlNode n in node.ParentNode.ChildNodes )
if( n.Name == node.Name ) // sibling with same name
{
if( n.Equals(node) )
{
if( ! alone ) return rank; // no need to continue
found = true;
}
else
{
if( found ) return rank; // no need to continue
alone = false;
rank++;
}
}
}
}
catch{}
return 0;
}
static public string GetXPath( this XmlNode node )
{
try
{
if( node is XmlAttribute )
return String.Format( "{0}/@{1}", (node as XmlAttribute).OwnerElement.GetXPath(), node.Name );
if( node is XmlText || node is XmlCDataSection )
return node.ParentNode.GetXPath();
if( node.ParentNode == null ) // the only node with no parent is the root node, which has no path
return "";
int rank = node.GetRank();
if( rank == 0 ) return String.Format( "{0}/{1}", node.ParentNode.GetXPath(), node.Name );
else return String.Format( "{0}/{1}[{2}]", node.ParentNode.GetXPath(), node.Name, rank );
}
catch{}
return "";
}
Sub Parse2(oSh As Long, inode As IXMLDOMNode, Optional iXstring As String = "", Optional indexes)
Dim chnode As IXMLDOMNode
Dim attr As IXMLDOMAttribute
Dim oXString As String
Dim chld As Long
Dim idx As Variant
Dim addindex As Boolean
chld = 0
idx = 0
addindex = False
'determine the node type:
Select Case inode.NodeType
Case NODE_ELEMENT
If inode.ParentNode.NodeType = NODE_DOCUMENT Then 'This gets the root node name but ignores all the namespace attributes
oXString = iXstring & "//" & fp(inode.nodename)
Else
'Need to deal with indexing. Where an element has siblings with the same nodeName,it needs to be indexed using [index], e.g swapstreams or schedules
For Each chnode In inode.ParentNode.ChildNodes
If chnode.NodeType = NODE_ELEMENT And chnode.nodename = inode.nodename Then chld = chld + 1
Next chnode
If chld > 1 Then '//inode has siblings of the same nodeName, so needs to be indexed
'Lookup the index from the indexes array
idx = getIndex(inode.nodename, indexes)
addindex = True
Else
End If
'build the XString
oXString = iXstring & "/" & fp(inode.nodename)
If addindex Then oXString = oXString & "[" & idx & "]"
'If type is element then check for attributes
For Each attr In inode.Attributes
'If the element has attributes then extract the data pair XString + Element.Name, @Attribute.Name=Attribute.Value
Call oSheet(oSh, oXString & "/@" & attr.Name, attr.Value)
Next attr
End If
Case NODE_TEXT
'build the XString
oXString = iXstring
Call oSheet(oSh, oXString, inode.NodeValue)
Case NODE_ATTRIBUTE
'Do nothing
Case NODE_CDATA_SECTION
'Do nothing
Case NODE_COMMENT
'Do nothing
Case NODE_DOCUMENT
'Do nothing
Case NODE_DOCUMENT_FRAGMENT
'Do nothing
Case NODE_DOCUMENT_TYPE
'Do nothing
Case NODE_ENTITY
'Do nothing
Case NODE_ENTITY_REFERENCE
'Do nothing
Case NODE_INVALID
'do nothing
Case NODE_NOTATION
'do nothing
Case NODE_PROCESSING_INSTRUCTION
'do nothing
End Select
'Now call Parser2 on each of inode's children.
If inode.HasChildNodes Then
For Each chnode In inode.ChildNodes
Call Parse2(oSh, chnode, oXString, indexes)
Next chnode
Set chnode = Nothing
Else
End If
End Sub
Function getIndex(tag As Variant, indexes) As Variant
'Function to get the latest index for an xml tag from the indexes array
'indexes array is passed from one parser function to the next up and down the tree
Dim i As Integer
Dim n As Integer
If IsArrayEmpty(indexes) Then
ReDim indexes(1, 0)
indexes(0, 0) = "Tag"
indexes(1, 0) = "Index"
Else
End If
For i = 0 To UBound(indexes, 2)
If indexes(0, i) = tag Then
'tag found, increment and return the index then exit
'also destroy all recorded tag names BELOW that level
indexes(1, i) = indexes(1, i) + 1
getIndex = indexes(1, i)
ReDim Preserve indexes(1, i) 'should keep all tags up to i but remove all below it
Exit Function
Else
End If
Next i
'tag not found so add the tag with index 1 at the end of the array
n = UBound(indexes, 2)
ReDim Preserve indexes(1, n + 1)
indexes(0, n + 1) = tag
indexes(1, n + 1) = 1
getIndex = 1
End Function
XDocument
一起使用,所以我编写了自己的代码来支持 XDocument
并使用递归。我认为这段代码比其他代码更好地处理了多个相同节点,因为它首先尝试尽可能深入 XML 路径,然后再返回构建所需的内容。因此,如果您有 /home/white/bob
和 /home/white/mike
,并且想要创建 /home/white/bob/garage
,该代码将知道如何创建它。但是,我不想搞乱谓词或通配符,所以我明确禁止了它们;但是添加对它们的支持很容易。Private Sub NodeItterate(XDoc As XElement, XPath As String)
'get the deepest path
Dim nodes As IEnumerable(Of XElement)
nodes = XDoc.XPathSelectElements(XPath)
'if it doesn't exist, try the next shallow path
If nodes.Count = 0 Then
NodeItterate(XDoc, XPath.Substring(0, XPath.LastIndexOf("/")))
'by this time all the required parent elements will have been constructed
Dim ParentPath As String = XPath.Substring(0, XPath.LastIndexOf("/"))
Dim ParentNode As XElement = XDoc.XPathSelectElement(ParentPath)
Dim NewElementName As String = XPath.Substring(XPath.LastIndexOf("/") + 1, XPath.Length - XPath.LastIndexOf("/") - 1)
ParentNode.Add(New XElement(NewElementName))
End If
'if we find there are more than 1 elements at the deepest path we have access to, we can't proceed
If nodes.Count > 1 Then
Throw New ArgumentOutOfRangeException("There are too many paths that match your expression.")
End If
'if there is just one element, we can proceed
If nodes.Count = 1 Then
'just proceed
End If
End Sub
Public Sub CreateXPath(ByVal XDoc As XElement, ByVal XPath As String)
If XPath.Contains("//") Or XPath.Contains("*") Or XPath.Contains(".") Then
Throw New ArgumentException("Can't create a path based on searches, wildcards, or relative paths.")
End If
If Regex.IsMatch(XPath, "\[\]()@='<>\|") Then
Throw New ArgumentException("Can't create a path based on predicates.")
End If
'we will process this recursively.
NodeItterate(XDoc, XPath)
End Sub