Java中用于解析文本文档中电话号码的正则表达式。

Question

Java中用于解析文本文档中电话号码的正则表达式。

3

我正在尝试使用正则表达式在杂乱的html文档中查找形式为(xxx) xxx-xxxx的电话号码。

文本文件中的行类似于：

  <div style="font-weight:bold;">
  <div>
   <strong>Main Phone:
   <span style="font-weight:normal;">(713) 555-9539&nbsp;&nbsp;&nbsp;&nbsp;
   <strong>Main Fax:
   <span style="font-weight:normal;">(713) 555-9541&nbsp;&nbsp;&nbsp;&nbsp;
   <strong>Toll Free:
   <span style="font-weight:normal;">(888) 555-9539

我的代码包含：

Pattern p = Pattern.compile("\\(\\d{3}\\)\\s\\d{3}-\\d{4}");
Matcher m = p.matcher(line); //from buffered reader, reading 1 line at a time

if (m.matches()) {
     stringArray.add(line);
}

问题在于，即使我将简单的内容放入模式中编译，它仍然返回空。如果它甚至不能识别像\d这样的内容，那么我怎么能获取电话号码呢？例如：

Pattern p = Pattern.compile("\\d+"); //Returns nothing
Pattern p = Pattern.compile("\\d");  //Returns nothing
Pattern p = Pattern.compile("\\s+"); //Returns lines
Pattern p = Pattern.compile("\\D");  //Returns lines

对我来说这真的很困惑，任何帮助都将不胜感激。

- James Phillips

2个回答

2

或者，您可以使用谷歌库——libphonenumber，如下所示：

    Set<String> phones = new HashSet<>();
    PhoneNumberUtil util = PhoneNumberUtil.getInstance();

    Iterator<PhoneNumberMatch> iterator = util.findNumbers(source, null).iterator();

    while (iterator.hasNext()) {
        phones.add(iterator.next().rawString());
    }

- Khozzy

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ravi K Thapliyal · Accepted Answer

使用Matcher#find()代替matches()，因为后者将尝试将整行与电话号码匹配。而find()可以搜索并返回子字符串匹配的true。

Matcher m = p.matcher(line);

此外，上面一行表明您在循环中再次创建相同的 Pattern 和 Matcher，这不是高效的。将Pattern移至循环外并重置和重用相同的Matcher以处理不同的行。

Pattern p = Pattern.compile("\\(\\d{3}\\)\\s\\d{3}-\\d{4}");

Matcher m = null;
String line = reader.readLine();
if (line != null && (m = p.matcher(line)).find()) {
    stringArray.add(line);
}

while ((line = reader.readLine()) != null) {
  m.reset(line);
  if (m.find()) {
    stringArray.add(line);
  }
}