正则表达式以获取正确的Java类名

Question

正则表达式以获取正确的Java类名

5

因为某些原因，我想要扫描Java文件（例如TagMatchingInterface.java）的内容，并且通过正则表达式提取类名（TagMatchingInterface），但是我的正则表达式匹配到了错误的类名，因为注释中隐藏了一些关键字（class/interface/enum）：

/**
 *
 * @author XXXX
 * Introduction: A common interface that judges all kinds of algorithm tags.
 * some other comment
 */
public class TagMatchingInterface 
{
  // content
  public class InnerClazz{
    // content
  }
}

这是我的模式：

public Pattern CLASS_PATTERN = Pattern.compile("(?:public\\s)?(?:.*\\s)?(class|interface|enum)\\s+([$_a-zA-Z][$_a-zA-Z0-9]*)");
....
Matcher matcher = CLASS_PATTERN.matcher(content);
if (matcher.find()) {
   System.out.println(match.group(2));
}

你对我的正则表达式有什么想法？

- vash_ace

2

为什么要费心呢？类名已经在文件名中了... - Laurel

1

由于某种原因，你通常在运行时没有源代码，因为Java是一种编译语言。 - Elliott Frisch

我想使用自定义的类加载器从MySQL中加载类，因此必须将内容作为多行字符串进行扫描。 - vash_ace

你需要 解析器。 - chengpohi

2个回答

1

首先，消除所有的注释。适合的正则表达式应该很容易编写。建议同时消除两种注释（单行和多行），以防止其中一种看起来像另一种的开头。

此外，在你处理的时候，也要把所有字符串都去掉，因为在类之前可能有一个注释字符串。

… 在注释中隐藏了一些关键词（class/interface/enum）：

"(?:public\\s)?(?:.*\\s)?(class|interface|enum)\\s+([$_a-zA-Z][$_a-zA-Z0-9]*)"

检查是否有public没有太大意义，因为没有它的部分同样匹配。实际上，如果其中一个class modifiers像final或abstract跟在public后面，那么只有后面的部分会匹配。

所以，如果您想知道类是否确实是公共的，您也必须检查这些内容。这将很棘手，因为您可能有带括号参数嵌套到任意深度的注释。这是正则表达式无法正确处理的事情。

那么对于包含非ASCII字母的name的类呢？输入中的unicode escapes呢？

- MvG

@vash_ace：不会将“public @Annotated class Foo”报告为公共的。 - MvG

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ro Yo Mi · Accepted Answer

描述

(?<=\n|\A)(?:public\s)?(class|interface|enum)\s([^\n\s]*)

正则表达式可视化

这个正则表达式的功能如下：

允许字符串以 public 开头或不以其开头
是一个 class 或 interface 或 enum
捕获名称

注意，我建议使用全局和大小写不敏感标志。

示例

实时示例

https://regex101.com/r/vR0iK3/1

示例文本

/**
 *
 * @author XXXX
 * Introduction: A common interface that judges all kinds of algorithm tags.
 * some other comment
 */
public class TagMatchingInterface 
{
  // content
  public class InnerClazz{
    // content
  }
}

示例匹配项

[0][0] = public class TagMatchingInterface
[0][1] = class
[0][2] = TagMatchingInterface

捕获组:

第0个捕获组得到整个匹配结果
第1个捕获组得到类名
第2个捕获组得到名称

说明

NODE                     EXPLANATION
----------------------------------------------------------------------
  (?<=                     look behind to see if there is:
----------------------------------------------------------------------
    \n                       '\n' (newline)
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    \A                        Start of the string
----------------------------------------------------------------------
  )                        end of look-behind
----------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
----------------------------------------------------------------------
    public                   'public'
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  )?                       end of grouping
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    class                    'class'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    interface                'interface'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    enum                     'enum'
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    [^\n\s]*                 any character except: '\n' (newline),
                             whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------