HtmlAgilityPack XPath不区分大小写

Question

HtmlAgilityPack XPath不区分大小写

c#xpath.net-2.0html-agility-packcase-sensitive

9

当我使用时，

SelectSingleNode("//meta[@name='keywords']")

它不起作用，但当我使用与原始文档中使用的相同情况时，它运作良好：

SelectSingleNode("//meta[@name='Keywords']")

那么问题是如何设置不区分大小写？

- kseen

XPath 是故意区分大小写的吗？ - Phil C

4个回答

5

如果需要更全面的解决方案，您可以为XPath处理器编写一个扩展函数，该函数将执行不区分大小写的比较。虽然代码量很大，但您只需编写一次即可。

实现扩展函数后，您可以按以下方式编写查询：

"//meta[@name[Extensions:CaseInsensitiveComparison('Keywords')]]"

下面是示例中实现的扩展函数Extensions:CaseInsensitiveComparison。

注意：这并没有经过充分测试，我只是为了回答这个问题而匆忙编写，因此错误处理等都不存在！

以下是自定义XSLT上下文的代码，提供一个或多个扩展函数。

using System;
using System.Xml.XPath;
using System.Xml.Xsl;
using System.Xml;
using HtmlAgilityPack;

public class XsltCustomContext : XsltContext
{
  public const string NamespaceUri = "http://XsltCustomContext";

  public XsltCustomContext()
  {
  }

  public XsltCustomContext(NameTable nt) 
    : base(nt)
  {    
  }

  public override IXsltContextFunction ResolveFunction(string prefix, string name, XPathResultType[] ArgTypes)
  {
    // Check that the function prefix is for the correct namespace
    if (this.LookupNamespace(prefix) == NamespaceUri)
    {
      // Lookup the function and return the appropriate IXsltContextFunction implementation
      switch (name)
      {
        case "CaseInsensitiveComparison":
          return CaseInsensitiveComparison.Instance;
      }
    }

    return null;
  }

  public override IXsltContextVariable ResolveVariable(string prefix, string name)
  {
    return null;
  }

  public override int CompareDocument(string baseUri, string nextbaseUri)
  {
    return 0;
  }

  public override bool PreserveWhitespace(XPathNavigator node)
  {
    return false;
  }

  public override bool Whitespace
  {
    get { return true; }
  }

  // Class implementing the XSLT Function for Case Insensitive Comparison
  class CaseInsensitiveComparison : IXsltContextFunction
  {
    private static XPathResultType[] _argTypes = new XPathResultType[] { XPathResultType.String };
    private static CaseInsensitiveComparison _instance = new CaseInsensitiveComparison();

    public static CaseInsensitiveComparison Instance
    {
      get { return _instance; }
    }      

    #region IXsltContextFunction Members

    public XPathResultType[] ArgTypes
    {
      get { return _argTypes; }
    }

    public int Maxargs
    {
      get { return 1; }
    }

    public int Minargs
    {
      get { return 1; }
    }

    public XPathResultType ReturnType
    {
      get { return XPathResultType.Boolean; }
    }

    public object Invoke(XsltContext xsltContext, object[] args, XPathNavigator navigator)
    {                
      // Perform the function of comparing the current element to the string argument
      // NOTE: You should add some error checking here.
      string text = args[0] as string;
      return string.Equals(navigator.Value, text, StringComparison.InvariantCultureIgnoreCase);        
    }
    #endregion
  }
}

您可以在XPath查询中使用上述扩展函数，这里是我们案例的示例。

class Program
{
  static string html = "<html><meta name=\"keywords\" content=\"HTML, CSS, XML\" /></html>";

  static void Main(string[] args)
  {
    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(html);

    XPathNavigator nav = doc.CreateNavigator();

    // Create the custom context and add the namespace to the context
    XsltCustomContext ctx = new XsltCustomContext(new NameTable());
    ctx.AddNamespace("Extensions", XsltCustomContext.NamespaceUri);

    // Build the XPath query using the new function
    XPathExpression xpath = 
      XPathExpression.Compile("//meta[@name[Extensions:CaseInsensitiveComparison('Keywords')]]");

    // Set the context for the XPath expression to the custom context containing the 
    // extensions
    xpath.SetContext(ctx);

    var element = nav.SelectSingleNode(xpath);

    // Now we have the element
  }
}

- Chris Taylor

这可以应用于节点名称吗？ - Arsen Zahray

2

这是我的做法：

HtmlNodeCollection MetaDescription = document.DocumentNode.SelectNodes("//meta[@name='description' or @name='Description' or @name='DESCRIPTION']");

string metaDescription = MetaDescription != null ? HttpUtility.HtmlDecode(MetaDescription.FirstOrDefault().Attributes["content"].Value) : string.Empty;

- formatc

1

你的方法不像Chris Taylor的那么通用。Chris的答案考虑了任何字符大小写的组合。 - kseen

2

@kseen 我知道，但是真的有可能有人会把类似“KeYwOrDs”的东西放进去吗？这是三种常见的方式，如果有人像那样写meta name，我怀疑你能从HTML文档中解析出任何东西。这是一个开箱即用的解决方案，只需要两行代码，在大多数情况下都能很好地工作，但一切都取决于你的需求。 - formatc

1

我试图遵守“永远不要相信用户输入”的规则，并友好地建议您也这样做。 - kseen

1

或者使用新的Linq语法，它应该支持不区分大小写的匹配：

        node = doc.DocumentNode.Descendants("meta")
            .Where(meta => meta.Attributes["name"] != null)
            .Where(meta => string.Equals(meta.Attributes["name"].Value, "keywords", StringComparison.OrdinalIgnoreCase))
            .Single();

但是为了防止NullReferenceException，您必须对属性进行丑陋的空值检查...

- jessehouwing

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Matthew Flaschen · Accepted Answer

如果实际值是未知的情况，我认为您需要使用翻译。我相信它应该是：

SelectSingleNode("//meta[translate(@name,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')='keywords']")

这是一种hack方法，但它是XPath 1.0中唯一的选项（除了相反的upper-case方法）。