从org.w3c.dom.Node获取Xpath

Question

从org.w3c.dom.Node获取Xpath

javaxmldom

26

我能否从org.w3c.dom.Node获取完整的xpath？

假设当前节点指向XML文档中间某个位置。我想提取该元素的xpath。

我要寻找的输出xpath是//parent/child1/chiild2/child3/node，即从父节点到目标节点的xpath。忽略那些带有表达式并且指向同一节点的xpath。

- srinannapa

除非您需要一个XPath 2.0解决方案（在XPath 1.0中不可能实现），并且定义了一组特定的XPath表达式，否则通常情况下无法回答这个问题：有无限的XPath表达式可以选择给定XML树的相同节点。 - user357812

@Alejandro：好的。我的XPath中不会有任何表达式。我正在寻找//parent/child1/chiild2/node。 - srinannapa

2

这在XPath 2.0规范本身中：string-join(ancestor-or-self::node()/name(),'/')。 - user357812

以下的 Stack Overflow 问题可能与您有关：https://dev59.com/SW445IYBdhLWcg3wq8Lp - bdoughan

6个回答

16

我在为支持jOOX的公司工作。jOOX是一个库，它提供了许多有用的扩展Java标准DOM API的功能，模仿了jquery API。通过jOOX，您可以像这样获取任何元素的XPath：

String path = $(element).xpath();

上述路径将会类似于这样

/document[1]/library[2]/books[3]/book[1]

- Lukas Eder

11

我从Mikkel Flindt post中获取了这段代码，并对其进行了修改，以便它可以用于属性节点。

public static String getFullXPath(Node n) {
// abort early
if (null == n)
  return null;

// declarations
Node parent = null;
Stack<Node> hierarchy = new Stack<Node>();
StringBuffer buffer = new StringBuffer();

// push element on stack
hierarchy.push(n);

switch (n.getNodeType()) {
case Node.ATTRIBUTE_NODE:
  parent = ((Attr) n).getOwnerElement();
  break;
case Node.ELEMENT_NODE:
  parent = n.getParentNode();
  break;
case Node.DOCUMENT_NODE:
  parent = n.getParentNode();
  break;
default:
  throw new IllegalStateException("Unexpected Node type" + n.getNodeType());
}

while (null != parent && parent.getNodeType() != Node.DOCUMENT_NODE) {
  // push on stack
  hierarchy.push(parent);

  // get parent of parent
  parent = parent.getParentNode();
}

// construct xpath
Object obj = null;
while (!hierarchy.isEmpty() && null != (obj = hierarchy.pop())) {
  Node node = (Node) obj;
  boolean handled = false;

  if (node.getNodeType() == Node.ELEMENT_NODE) {
    Element e = (Element) node;

    // is this the root element?
    if (buffer.length() == 0) {
      // root element - simply append element name
      buffer.append(node.getNodeName());
    } else {
      // child element - append slash and element name
      buffer.append("/");
      buffer.append(node.getNodeName());

      if (node.hasAttributes()) {
        // see if the element has a name or id attribute
        if (e.hasAttribute("id")) {
          // id attribute found - use that
          buffer.append("[@id='" + e.getAttribute("id") + "']");
          handled = true;
        } else if (e.hasAttribute("name")) {
          // name attribute found - use that
          buffer.append("[@name='" + e.getAttribute("name") + "']");
          handled = true;
        }
      }

      if (!handled) {
        // no known attribute we could use - get sibling index
        int prev_siblings = 1;
        Node prev_sibling = node.getPreviousSibling();
        while (null != prev_sibling) {
          if (prev_sibling.getNodeType() == node.getNodeType()) {
            if (prev_sibling.getNodeName().equalsIgnoreCase(
                node.getNodeName())) {
              prev_siblings++;
            }
          }
          prev_sibling = prev_sibling.getPreviousSibling();
        }
        buffer.append("[" + prev_siblings + "]");
      }
    }
  } else if (node.getNodeType() == Node.ATTRIBUTE_NODE) {
    buffer.append("/@");
    buffer.append(node.getNodeName());
  }
}
// return buffer
return buffer.toString();
}

- TAN70

8

我认为最好的方法是使用org.w3c.dom元素来处理（仅供参考）：

String getXPath(Node node)
{
    Node parent = node.getParentNode();
    if (parent == null)
    {
        return "";
    }
    return getXPath(parent) + "/" + node.getNodeName();
}

- Alex

如果第一个返回语句返回一个空字符串，那么这对我来说效果更好。否则，xpath 返回的将以两个正斜杠开头。 - Adam Wise

@Adam Wise：谢谢，你说得对，那个斜杠是多余的..我会修复代码的。 - Alex

3

计数器（如“html[1]/div[3]”）怎么样？ - roesslerj

3

一些专门处理XML的IDE可以为您完成此操作。

以下是最著名的几个：

例如，在oXygen中，您可以右键单击XML文档中的元素部分，上下文菜单将有一个“复制Xpath”的选项。

还有许多Firefox附加组件（如XPather），可以愉快地为您完成此工作。对于Xpather，您只需单击网页的某个部分，然后在上下文菜单中选择“在XPather中显示”，就完成了。

但是，正如丹在他的答案中指出的那样，XPath表达式的用途有限。它不会包括谓词。相反，它看起来像这样。

/root/nodeB[2]/subnodeX[2]

对于像这样的文档

<root>
   <nodeA>stuff</nodeA>
   <nodeB>more stuff</nodeB>
   <nodeB cond="thisOne">
       <subnodeX>useless stuff</subnodeX>
       <subnodeX id="MyCondition">THE STUFF YOU WANT</subnodeX>
       <subnodeX>more useless stuff</subnodeX>
   </nodeB>
</root>

我列出的工具将不会生成。

/root/nodeB[@cond='thisOne']/subnodeX[@id='MyCondition']

例如对于一个HTML页面，你最终会得到一个相当无用的表达式：

/html/body/div[6]/p[3]

这是可以预料的。如果他们必须生成谓词，他们怎么知道哪个条件是相关的？有无数种可能性。

- Alain Pannetier

-1

类似这样的代码将会给你一个简单的xpath：

public String getXPath(Node node) {
    return getXPath(node, "");
}

public String getXPath(Node node, String xpath) {
    if (node == null) {
        return "";
    }
    String elementName = "";
    if (node instanceof Element) {
        elementName = ((Element) node).getLocalName();
    }
    Node parent = node.getParentNode();
    if (parent == null) {
        return xpath;
    }
    return getXPath(parent, "/" + elementName + xpath);
}

- bajistaman

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Dan Breslau · Accepted Answer

通常情况下，没有通用的方法可以获取XPath，因为在文档中没有一个通用的XPath可以唯一地标识特定的节点。在某些模式中，节点将通过属性（id和name可能是最常见的属性）被唯一标识。在其他情况下，每个元素的名称（即标签）足以唯一标识节点。在少数情况下（不太可能，但有可能），没有一个唯一的名称或属性可以将您带到特定的节点，因此您需要使用基数（获取...的第m个子节点的第n个子节点）。

编辑： 在大多数情况下，很容易创建一个与模式相关的函数来组装给定节点的XPath。例如，假设您有一个文档，其中每个节点都通过id属性唯一标识，并且您没有使用命名空间。那么以下伪Java代码（我想）将基于这些属性返回XPath。（警告：我没有测试过这个代码。）

String getXPath(Node node)
{
    Node parent = node.getParent();
    if (parent == null) {
        return "/" + node.getTagName();
    }
    return getXPath(parent) + "/" + "[@id='" + node.getAttribute("id") + "']";
}