返回 Javascript 中正则表达式 match() 的位置？

Question

返回 Javascript 中正则表达式 match() 的位置？

javascriptregexmatchstring-matching

230

在Javascript中，是否有一种方法可以检索正则表达式match()的结果字符串中（起始）字符位置？

- stagas

12个回答

78

这是我想到的代码：

// Finds starting and ending positions of quoted text
// in double or single quotes with escape char support like \" \'
var str = "this is a \"quoted\" string as you can 'read'";

var patt = /'((?:\\.|[^'])*)'|"((?:\\.|[^"])*)"/igm;

while (match = patt.exec(str)) {
  console.log(match.index + ' ' + patt.lastIndex);
}

- stagas

26

对于结束位置，match.index + match[0].length 也可以使用。 - Beni Cherniavsky-Paskin

1

@BeniCherniavsky-Paskin，结束位置不应该是 match.index + match[0].length - 1 吗？ - David

2

@David，我的意思是独占结束位置，就像.slice()和.substring()所采用的方式。包含结束位置会比你说的少1个。（请注意，包容通常意味着匹配中最后一个字符的索引，除非它是空匹配，在这种情况下它是在匹配之前，对于从开头起始的空匹配可能是-1，在字符串外部...） - Beni Cherniavsky-Paskin

对于 patt = /.*/，它会进入无限循环，我们如何限制它？ - abinas patra

31

在现代浏览器中，您可以使用string.matchAll()实现此目的。

相对于RegExp.exec()，这种方法的好处是它不依赖于正则表达式具有状态，就像@Gumbo答案中所述的那样。

let regexp = /bar/g;
let str = 'foobarfoobar';

let matches = [...str.matchAll(regexp)];
matches.forEach((match) => {
    console.log("match found at " + match.index);
});

- brismuth

1

我运气不错，使用了基于matchAll的单行解决方案

    let regexp = /bar/g;
    let str = 'foobarfoobar';
    let matchIndices = Array.from(str.matchAll(regexp)).map(x => x.index);
    console.log(matchIndices)```

- Steven Schkolne

不确定你为什么说这种方法不依赖于正则表达式具有状态。我在没有使用 g 标志的情况下尝试了你的代码，并且出现错误。 - Ooker

"g"标志表示"全局搜索"，即匹配字符串中的所有出现。如果您不进行全局搜索，使用str.matchAll()就没有意义。希望这能帮到您，但我不确定您想要做什么。关于我的"有状态"评论，我是指您不必像Gumbo的回答中那样使用"while"循环并依赖正则表达式对象的内部状态。祝您好运！ - brismuth

27

根据developer.mozilla.org上的文档，String中的.match()方法会返回一个数组。这个数组会包含一个额外的input属性，它是被解析的原始字符串。此外，它还有一个索引属性，表示匹配在字符串中的从零开始的索引位置。

当处理非全局正则表达式（即您的正则表达式没有在末尾加上g标志）时，.match()返回的值会有一个index属性……只需访问即可。

var index = str.match(/regex/).index;

以下是一个演示它正常工作的示例:

var str = 'my string here';

var index = str.match(/here/).index;

console.log(index); // <- 10

我已经成功测试过这个功能，即使是在IE5中也能运行。

- Jimbo Jonny

这会返回一个数组，而不是带有索引的对象。 - Ben Taliadoros

@BenTaliadoros，恐怕你错了，它既是一个数组又是一个带有“index”属性的对象（请参见答案）。 - phil294

1

似乎是这样！不确定我那些年在想什么。 - Ben Taliadoros

2

请注意，如果您使用全局标志执行 str.match(/here/g)，match.index 将为 undefined。 - SethWhite

10

你可以使用字符串对象的search方法。这只适用于第一个匹配项，但会执行你所描述的操作。例如：

search 方法示例：

"How are you?".search(/are/);
// 4

- Jimmy

7

我最近发现了一个很酷的功能，我在控制台上尝试了一下，似乎可以用：

var text = "border-bottom-left-radius";

var newText = text.replace(/-/g,function(match, index){
    return " " + index + " ";
});

这句话的意思是："边框宽度为6，底部内边距为13，左侧内边距为18，圆角半径为默认值"。

因此，这似乎就是您要找的内容。

- felipeab

6

请注意，替换函数也会添加捕获组，因此请注意，替换函数arguments中永远是倒数第二个条目是位置，而不是“第二个参数”。函数的参数包括“完整匹配、第一组、第二组，等等，匹配的索引，完整字符串匹配”信息。 - Mike 'Pomax' Kamermans

4

很抱歉，之前的回答（基于exec）似乎无法处理正则表达式匹配宽度为0的情况。例如（注意：/\b/g是应该找到所有单词边界的正则表达式）：

var re = /\b/g,
    str = "hello world";
var guard = 10;
while ((match = re.exec(str)) != null) {
    console.log("match found at " + match.index);
    if (guard-- < 0) {
      console.error("Infinite loop detected")
      break;
    }
}

一个方法是尝试让正则表达式至少匹配一个字符来解决这个问题，但这远非理想（并且意味着您必须手动添加字符串末尾的索引）。

var re = /\b./g,
    str = "hello world";
var guard = 10;
while ((match = re.exec(str)) != null) {
    console.log("match found at " + match.index);
    if (guard-- < 0) {
      console.error("Infinite loop detected")
      break;
    }
}

更好的解决方案（只适用于较新的浏览器/在旧版/IE版本上需要填充）是使用String.prototype.matchAll()。

var re = /\b/g,
    str = "hello world";
console.log(Array.from(str.matchAll(re)).map(match => match.index))

解释：

String.prototype.matchAll()期望一个全局正则表达式（即带有全局标志g的正则表达式）。它返回一个迭代器。为了循环遍历并map()这个迭代器，它必须被转换为数组（这正是Array.from()所做的）。与RegExp.prototype.exec()的结果类似，根据规范，生成的元素具有一个.index字段。

请参见String.prototype.matchAll()和Array.from() MDN页面以获取浏览器支持和polyfill选项。

编辑：深入挖掘以寻找适用于所有浏览器的解决方案 RegExp.prototype.exec() 的问题在于它会更新正则表达式的 lastIndex 指针，并且下一次搜索将从先前找到的 lastIndex 开始。

var re = /l/g,
str = "hello world";
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)

只要正则表达式匹配的宽度不为0，这个方法就非常有效。如果使用0宽度正则表达式，指针不会增加，因此会出现无限循环（注意：/(?=l)/g是一个前瞻匹配 l 的正则表达式 -- 它匹配 l 前面的0宽度字符串。所以第一次调用exec()时正确地到达了索引2，然后就停在那里了）。

var re = /(?=l)/g,
str = "hello world";
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)

The solution (that is less nice than matchAll(), but should work on all browsers) therefore is to manually increase the lastIndex if the match width is 0 (which may be checked in different ways)

var re = /\b/g,
    str = "hello world";
while ((match = re.exec(str)) != null) {
    console.log("match found at " + match.index);

    // alternative: if (match.index == re.lastIndex) {
    if (match[0].length == 0) {
      // we need to increase lastIndex -- this location was already matched,
      // we don't want to match it again (and get into an infinite loop)
      re.lastIndex++
    }
}

- Claude

2

var str = "The rain in SPAIN stays mainly in the plain";

function searchIndex(str, searchValue, isCaseSensitive) {
  var modifiers = isCaseSensitive ? 'gi' : 'g';
  var regExpValue = new RegExp(searchValue, modifiers);
  var matches = [];
  var startIndex = 0;
  var arr = str.match(regExpValue);

  [].forEach.call(arr, function(element) {
    startIndex = str.indexOf(element, startIndex);
    matches.push(startIndex++);
  });

  return matches;
}

console.log(searchIndex(str, 'ain', true));

- Yaroslav

这是不正确的。str.indexOf 只会找到匹配文本的下一个出现位置，而不一定是匹配项本身。JS正则表达式支持在捕获之外的文本上使用前瞻条件。例如 searchIndex("foobarfoobaz", "foo(?=baz)", true) 应该返回 [6]，而不是 [0]。 - rakslice

为什么要使用 [].forEach.call(arr, function(element) 而不是 arr.forEach 或 arr.map？ - Ankit Kumar

2

此成员函数返回一个0-based位置的数组，如果有的话，表示输入单词在String对象中的位置。

String.prototype.matching_positions = function( _word, _case_sensitive, _whole_words, _multiline )
{
   /*besides '_word' param, others are flags (0|1)*/
   var _match_pattern = "g"+(_case_sensitive?"i":"")+(_multiline?"m":"") ;
   var _bound = _whole_words ? "\\b" : "" ;
   var _re = new RegExp( _bound+_word+_bound, _match_pattern );
   var _pos = [], _chunk, _index = 0 ;

   while( true )
   {
      _chunk = _re.exec( this ) ;
      if ( _chunk == null ) break ;
      _pos.push( _chunk['index'] ) ;
      _re.lastIndex = _chunk['index']+1 ;
   }

   return _pos ;
}

现在尝试一下。

var _sentence = "What do doers want ? What do doers need ?" ;
var _word = "do" ;
console.log( _sentence.matching_positions( _word, 1, 0, 0 ) );
console.log( _sentence.matching_positions( _word, 1, 1, 0 ) );

你也可以输入正则表达式：

var _second = "z^2+2z-1" ;
console.log( _second.matching_positions( "[0-9]\z+", 0, 0, 0 ) );

这里得到线性项的位置索引。

- Sandro Rosa

1

我有幸使用了基于matchAll的单行解决方案（我的用例需要一个字符串位置数组）

let regexp = /bar/g;
let str = 'foobarfoobar';

let matchIndices = Array.from(str.matchAll(regexp)).map(x => x.index);

console.log(matchIndices)

输出结果: [3, 9]

- Steven Schkolne

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Gumbo · Accepted Answer

305

exec 返回一个对象，该对象带有一个 index 属性：

var match = /bar/.exec("foobar");
if (match) {
    console.log("match found at " + match.index);
}

对于多个匹配项：

var re = /bar/g,
    str = "foobarfoobar";
while ((match = re.exec(str)) != null) {
    console.log("match found at " + match.index);
}

- Gumbo

6

谢谢你的帮助！你可以告诉我如何找到多个匹配项的索引吗？ - stagas

32

注意：使用re作为变量，并添加g修饰符都是至关重要的！否则你将会陷入无尽的循环中。 - oriadam

1

@OnurYıldırım - 这是一个 jsfiddle 的演示链接，我已经测试过它可以在 IE5 上正常工作...非常好用：https://jsfiddle.net/6uwn1vof/ - Jimbo Jonny

1

@JimboJonny，嗯，我学到了新东西。我的测试用例返回undefined。https://jsfiddle.net/6uwn1vof/2/，这不像你的那个例子是一个类似搜索的例子。 - Onur Yıldırım

1

@OnurYıldırım - 移除 g 标志，它就可以工作了。由于 match 是字符串的函数，而不是正则表达式，所以它不能像 exec 一样具有状态，因此只有在您不寻找全局匹配时（即状态无关紧要）才会将其视为 exec（即具有索引属性）。 - Jimbo Jonny

显示剩余14条评论