正则表达式 exec 方法只返回第一个匹配结果

Question

正则表达式 exec 方法只返回第一个匹配结果

58

我正在尝试实现在golfscript语法页面上找到的以下正则表达式搜索。

var ptrn = /[a-zA-Z_][a-zA-Z0-9_]*|'(?:\\.|[^'])*'?|"(?:\\.|[^"])*"?|-?[0-9]+|#[^\n\r]*|./mg;
input = ptrn.exec(input);

输入只会匹配正则表达式的第一个结果。例如："hello" "world" 应该返回 ["hello", "world"]，但是它只返回了 ["hello"]。

- user181351

3个回答

33

你可以在字符串上调用match方法来检索所有匹配项：

var ptrn = /[a-zA-Z_][a-zA-Z0-9_]*|'(?:\\.|[^'])*'?|"(?:\\.|[^"])*"?|-?[0-9]+|#[^\n\r]*|./mg;
var results = "hello world".match(ptrn);

results 表示（根据正则表达式）：

["hello", " ", "world"]

match规范在这里

- Eadel

这正是我搜索时寻找的，谢谢！跟进问题是：是否可以对具有多个匹配组的正则表达式执行此操作？即 "hello world hello world".match(/\S+ \S/g)。我可以接受1D或2D数组的输出。 - pixelpax

1

OR 运算符 | 不已经足够了吗？例如："cat dog cat tiger dog".match(/(cat)|(dog)/g) 将产生 ["cat", "dog", "cat", "dog"]。这是一个一维数组。你的例子中期望的输出是什么？ - Eadel

也许我应该给出一个不那么一般的例子。具体来说，我正在尝试解析一个包含以下形式行中网格面信息的OBJ文件：f 3/5/4 5/13/4 2/2/3 4/6/3 在我的情况下，斜杠中间可能有数字，但如果有，我想忽略它。因此，我想做类似于/ (\d+)\/.*\/(\d+) /g的操作，以仅提取斜杠项的末尾。 - pixelpax

1

可能有更好的解决方案，但我会使用类似这样的代码... "3/5/4 5/13/4 2/2/3 4/6/3".match(/(\/(\d+)(?!(\d+|\/))|((^|\s)\d+))/g)。这将返回 ["3", "/4", " 5", "/4", " 2", "/3", " 4", "/3"]，你可以通过检查斜杠和空格来区分第三个数字。这个正则表达式使用了负向先行断言，它检查数字后面没有跟着斜杠。它只适用于最后一个符号。模式的第二部分，在 | 运算符之后寻找在数字前有换行符或空格的数字。我只在你的例子上进行了测试，所以如果你要使用它，请在更多数据上进行测试。 - Eadel

2

我不明白你问题中的 "hello" "world" 是指用户输入还是正则表达式，但我被告知 RegExp 对象有一个状态 -- 它的 lastIndex 位置从哪里开始搜索。它不会一次性返回所有结果。它只返回第一个匹配项，你需要继续使用 .exec 来获取从 lastIndex 位置开始的其余结果:

const re1 = /^\s*(\w+)/mg; // find all first words in every line
const text1 = "capture discard\n me but_not_me" // two lines of text
for (let match; (match = re1.exec(text1)) !== null;) 
      console.log(match, "next search at", re1.lastIndex);

打印

["capture", "capture"] "next search at" 7
[" me", "me"] "next search at" 19

这里介绍了使用功能性JS6构建结果迭代器的方式

RegExp.prototype.execAllGen = function*(input) {
    for (let match; (match = this.exec(input)) !== null;) 
      yield match;
} ; RegExp.prototype.execAll = function(input) {
  return [...this.execAllGen(input)]}

请注意，与 poke 不同的是，我在 for 循环中使用了更漂亮的 match 变量。现在，您可以轻松地在一行中捕获匹配项。

const matches = re1.execAll(text1)

log("captured strings:", matches.map(m=>m[1]))
log(matches.map(m=> [m[1],m.index]))
for (const match of matches) log(match[1], "found at",match.index)

打印的是

"captured strings:" ["capture", "me"]

[["capture", 0], ["me", 16]]
"capture" "found at" 0
"me" "found at" 16

- Little Alien

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- poke · Accepted Answer

RegExp.exec一次只能返回一个匹配结果。

为了检索多个匹配项，您需要多次在表达式对象上运行exec。例如，使用简单的 while 循环：

var ptrn = /[a-zA-Z_][a-zA-Z0-9_]*|'(?:\\.|[^'])*'?|"(?:\\.|[^"])*"?|-?[0-9]+|#[^\n\r]*|./mg;

var match;
while ((match = ptrn.exec(input)) != null) {
    console.log(match);
}

这将把所有匹配项记录到控制台。

请注意，为了使其工作，您需要确保正则表达式具有g（全局）标志。此标志确保在对表达式执行某些方法后，lastIndex属性已更新，因此进一步调用将从上一个结果之后开始。

正则表达式还需要在循环外部声明（如上例所示）。否则，表达式对象将在每次迭代时重新创建，然后lastIndex显然会每次重置，导致无限循环。