C#中是否有一种方法可以检查字符串是否为有效标识符?

27

在Java中,有一个称为Character类的方法isJavaIdentifierStartisJavaIdentifierPart,可用于检查字符串是否是有效的Java标识符。

public boolean isJavaIdentifier(String s) {
  int n = s.length();
  if (n==0) return false;
  if (!Character.isJavaIdentifierStart(s.charAt(0)))
      return false;
  for (int i = 1; i < n; i++)
      if (!Character.isJavaIdentifierPart(s.charAt(i)))
          return false;
  return true;
}

有没有类似这样的C#库?

8个回答

39

1
这确实会对性能产生一些影响,您应该意识到此事。请查看我的帖子以获取更多信息。 - Scott Wisniewski

10

我会谨慎考虑这里提供的其他解决方案。调用CodeDomProvider.CreateProvider需要查找和解析Machine.Config文件以及您的app.config文件。这可能比仅检查字符串所需的时间慢几倍。

相反,我建议您进行以下更改之一:

  1. 在静态变量中缓存提供程序。

    这将使您只需要创建一次,但会减慢类型加载速度。

  2. 直接创建提供程序,通过创建Microsoft.CSharp.CSharpCodeProvider实例自己来完成

    这将完全跳过配置文件解析。

  3. 编写代码以自己实现检查。

    如果这样做,您可以最大程度地控制其如何实现,这可以帮助您优化性能(如果需要)。 有关C#标识符的完整词法语法,请参见C#语言规范第2.2.4节。


8

随着Roslyn的开源化,代码分析工具就在您的指尖之间,并且它们是为了性能而编写的。(现在它们还处于预发布阶段。)

然而,我无法说明加载程序集所需的性能成本。

使用NuGet安装工具:

Install-Package Microsoft.CodeAnalysis -Pre

请提出您的问题:

var isValid = Microsoft.CodeAnalysis.CSharp.SyntaxFacts.IsValidIdentifier("I'mNotValid");
Console.WriteLine(isValid);     // False

6
基本上就是这样的东西:
const string start = @"(\p{Lu}|\p{Ll}|\p{Lt}|\p{Lm}|\p{Lo}|\p{Nl})";
const string extend = @"(\p{Mn}|\p{Mc}|\p{Nd}|\p{Pc}|\p{Cf})";
Regex ident = new Regex(string.Format("{0}({0}|{1})*", start, extend));
s = s.Normalize();
return ident.IsMatch(s);

6
天哪,这个帖子竟然有7个赞,但它根本不起作用,而且直到我修复代码之前都无法编译... - Stefan Steiger
原始来源在它下线之前已被归档。 - Palec

5

在这里进行召唤。

在.NET Core/DNX中,你可以使用Roslyn-SyntaxFacts来实现。

Microsoft.CodeAnalysis.CSharp.SyntaxFacts.IsReservedKeyword(
        Microsoft.CodeAnalysis.CSharp.SyntaxFacts.GetKeywordKind("protected")
);



foreach (ColumnDefinition cl in tableColumns)
{
    sb.Append(@"         public ");
    sb.Append(cl.DOTNET_TYPE);
    sb.Append(" ");

    // for keywords
    //if (!Microsoft.CodeAnalysis.CSharp.SyntaxFacts.IsValidIdentifier(cl.COLUMN_NAME))
    if (Microsoft.CodeAnalysis.CSharp.SyntaxFacts.IsReservedKeyword(
        Microsoft.CodeAnalysis.CSharp.SyntaxFacts.GetKeywordKind(cl.COLUMN_NAME)
        ))
        sb.Append("@");

    sb.Append(cl.COLUMN_NAME);
    sb.Append("; // ");
    sb.AppendLine(cl.SQL_TYPE);
} // Next cl 


或者在旧版本中使用Codedom——查看mono源代码后:

CodeDomProvider.cs

public virtual bool IsValidIdentifier (string value) 
286         { 
287             ICodeGenerator cg = CreateGenerator (); 
288             if (cg == null) 
289                 throw GetNotImplemented (); 
290             return cg.IsValidIdentifier (value); 
291         } 
292  

然后是CSharpCodeProvider.cs。
public override ICodeGenerator CreateGenerator() 
91      { 
92 #if NET_2_0 
93          if (providerOptions != null && providerOptions.Count > 0) 
94              return new Mono.CSharp.CSharpCodeGenerator (providerOptions); 
95 #endif 
96          return new Mono.CSharp.CSharpCodeGenerator(); 
97      } 

然后是CSharpCodeGenerator.cs。
protected override bool IsValidIdentifier (string identifier)
{
    if (identifier == null || identifier.Length == 0)
        return false;

    if (keywordsTable == null)
        FillKeywordTable ();

    if (keywordsTable.Contains (identifier))
        return false;

    if (!is_identifier_start_character (identifier [0]))
        return false;

    for (int i = 1; i < identifier.Length; i ++)
        if (! is_identifier_part_character (identifier [i]))
            return false;

    return true;
}



private static System.Collections.Hashtable keywordsTable;
private static string[] keywords = new string[] {
    "abstract","event","new","struct","as","explicit","null","switch","base","extern",
    "this","false","operator","throw","break","finally","out","true",
    "fixed","override","try","case","params","typeof","catch","for",
    "private","foreach","protected","checked","goto","public",
    "unchecked","class","if","readonly","unsafe","const","implicit","ref",
    "continue","in","return","using","virtual","default",
    "interface","sealed","volatile","delegate","internal","do","is",
    "sizeof","while","lock","stackalloc","else","static","enum",
    "namespace",
    "object","bool","byte","float","uint","char","ulong","ushort",
    "decimal","int","sbyte","short","double","long","string","void",
    "partial", "yield", "where"
};


static void FillKeywordTable ()
{
    lock (keywords) {
        if (keywordsTable == null) {
            keywordsTable = new Hashtable ();
            foreach (string keyword in keywords) {
                keywordsTable.Add (keyword, keyword);
            }
        }
    }
}



static bool is_identifier_start_character (char c)
{
    return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || c == '_' || c == '@' || Char.IsLetter (c);
}

static bool is_identifier_part_character (char c)
{
    return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || c == '_' || (c >= '0' && c <= '9') || Char.IsLetter (c);
}

你得到了这段代码:
public static bool IsValidIdentifier (string identifier)
{
    if (identifier == null || identifier.Length == 0)
        return false;

    if (keywordsTable == null)
        FillKeywordTable();

    if (keywordsTable.Contains(identifier))
        return false;

    if (!is_identifier_start_character(identifier[0]))
        return false;

    for (int i = 1; i < identifier.Length; i++)
        if (!is_identifier_part_character(identifier[i]))
            return false;

    return true;
}


internal static bool is_identifier_start_character(char c)
{
    return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || c == '_' || c == '@' || char.IsLetter(c);
}

internal static bool is_identifier_part_character(char c)
{
    return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || c == '_' || (c >= '0' && c <= '9') || char.IsLetter(c);
}


private static System.Collections.Hashtable keywordsTable;
private static string[] keywords = new string[] {
    "abstract","event","new","struct","as","explicit","null","switch","base","extern",
    "this","false","operator","throw","break","finally","out","true",
    "fixed","override","try","case","params","typeof","catch","for",
    "private","foreach","protected","checked","goto","public",
    "unchecked","class","if","readonly","unsafe","const","implicit","ref",
    "continue","in","return","using","virtual","default",
    "interface","sealed","volatile","delegate","internal","do","is",
    "sizeof","while","lock","stackalloc","else","static","enum",
    "namespace",
    "object","bool","byte","float","uint","char","ulong","ushort",
    "decimal","int","sbyte","short","double","long","string","void",
    "partial", "yield", "where"
};

internal static void FillKeywordTable()
{
    lock (keywords)
    {
        if (keywordsTable == null)
        {
            keywordsTable = new System.Collections.Hashtable();
            foreach (string keyword in keywords)
            {
                keywordsTable.Add(keyword, keyword);
            }
        }
    }
}

最好也检查“上下文关键字”。 您可以使用 SyntaxFacts.GetKeywordKind(keyword) == SyntaxKind.None && SyntaxFacts.GetContextualKeywordKind(keyword) == SyntaxKind.None 来检查字符串是否为保留关键字或上下文关键字,或者使用 SyntaxFacts.GetKeywordKinds().Select(SyntaxFacts.GetText) 获取完整列表。 - EM0

4

最近,我编写了一个扩展方法,用于验证字符串是否为有效的C#标识符。

你可以在这里找到实现代码:https://gist.github.com/FabienDehopre/5245476

该方法基于Identifier的MSDN文档(http://msdn.microsoft.com/en-us/library/aa664670(v=vs.71).aspx)。

public static bool IsValidIdentifier(this string identifier)
{
    if (String.IsNullOrEmpty(identifier)) return false;

    // C# keywords: http://msdn.microsoft.com/en-us/library/x53a06bb(v=vs.71).aspx
    var keywords = new[]
                       {
                           "abstract",  "event",      "new",        "struct",
                           "as",        "explicit",   "null",       "switch",
                           "base",      "extern",     "object",     "this",
                           "bool",      "false",      "operator",   "throw",
                           "breal",     "finally",    "out",        "true",
                           "byte",      "fixed",      "override",   "try",
                           "case",      "float",      "params",     "typeof",
                           "catch",     "for",        "private",    "uint",
                           "char",      "foreach",    "protected",  "ulong",
                           "checked",   "goto",       "public",     "unchekeced",
                           "class",     "if",         "readonly",   "unsafe",
                           "const",     "implicit",   "ref",        "ushort",
                           "continue",  "in",         "return",     "using",
                           "decimal",   "int",        "sbyte",      "virtual",
                           "default",   "interface",  "sealed",     "volatile",
                           "delegate",  "internal",   "short",      "void",
                           "do",        "is",         "sizeof",     "while",
                           "double",    "lock",       "stackalloc",
                           "else",      "long",       "static",
                           "enum",      "namespace",  "string"
                       };

    // definition of a valid C# identifier: http://msdn.microsoft.com/en-us/library/aa664670(v=vs.71).aspx
    const string formattingCharacter = @"\p{Cf}";
    const string connectingCharacter = @"\p{Pc}";
    const string decimalDigitCharacter = @"\p{Nd}";
    const string combiningCharacter = @"\p{Mn}|\p{Mc}";
    const string letterCharacter = @"\p{Lu}|\p{Ll}|\p{Lt}|\p{Lm}|\p{Lo}|\p{Nl}";
    const string identifierPartCharacter = letterCharacter + "|" +
                                           decimalDigitCharacter + "|" +
                                           connectingCharacter + "|" +
                                           combiningCharacter + "|" +
                                           formattingCharacter;
    const string identifierPartCharacters = "(" + identifierPartCharacter + ")+";
    const string identifierStartCharacter = "(" + letterCharacter + "|_)";
    const string identifierOrKeyword = identifierStartCharacter + "(" +
                                       identifierPartCharacters + ")*";
    var validIdentifierRegex = new Regex("^" + identifierOrKeyword + "$", RegexOptions.Compiled);
    var normalizedIdentifier = identifier.Normalize();

    // 1. check that the identifier match the validIdentifer regex and it's not a C# keyword
    if (validIdentifierRegex.IsMatch(normalizedIdentifier) && !keywords.Contains(normalizedIdentifier))
    {
        return true;
    }

    // 2. check if the identifier starts with @
    if (normalizedIdentifier.StartsWith("@") && validIdentifierRegex.IsMatch(normalizedIdentifier.Substring(1)))
    {
        return true;
    }

    // 3. it's not a valid identifier
    return false;
}

2
现在发布的Roslyn项目提供了Microsoft.CodeAnalysis.CSharp.SyntaxFacts,其中包含类似Java的SyntaxFacts.IsIdentifierStartCharacter(char)SyntaxFacts.IsIdentifierPartCharacter(char)方法。
下面是一个简单的函数示例,用于将名词短语(例如“开始日期”)转换为C#标识符(例如“StartDate”)。注意,我使用Humanizer进行驼峰式转换,并使用Roslyn检查字符是否有效。
    public static string Identifier(string name)
    {
        Check.IsNotNullOrWhitespace(name, nameof(name));

        // trim off leading and trailing whitespace
        name = name.Trim();

        // should deal with spaces => camel casing;
        name = name.Dehumanize();

        var sb = new StringBuilder();
        if (!SyntaxFacts.IsIdentifierStartCharacter(name[0]))
        {
            // the first characters 
            sb.Append("_");
        }

        foreach(var ch in name)
        {
            if (SyntaxFacts.IsIdentifierPartCharacter(ch))
            {
                sb.Append(ch);
            }
        }

        var result = sb.ToString();

        if (SyntaxFacts.GetKeywordKind(result) != SyntaxKind.None)
        {
            result = @"@" + result;
        }

        return result;
    }

测试;
    [TestCase("Start Date", "StartDate")]
    [TestCase("Bad*chars", "BadChars")]
    [TestCase("   leading ws", "LeadingWs")]
    [TestCase("trailing ws   ", "TrailingWs")]
    [TestCase("class", "Class")]
    [TestCase("int", "Int")]
    [Test]
    public void CSharp_GeneratesDecentIdentifiers(string input, string expected)
    {
        Assert.AreEqual(expected, CSharp.Identifier(input));
    }

这是一个有用的事实,但你没有解释如何利用它,这并不是很有帮助。我似乎找不到“Microsoft.CodeAnalysis” NuGet包,也找不到官方页面说明该库可以在哪里获取。 - NightOwl888
我在第一句提供了链接:https://github.com/dotnet/roslyn。它指出:“nuget install Microsoft.CodeAnalysis # 安装语言API和服务”。 - Steve Cooper
你应该安装 Microsoft.CodeAnalysis.CSharp 以获取 C# 规则。 - Steve Cooper

1

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接