正则表达式能返回匹配的行号吗？

Question

正则表达式能返回匹配的行号吗？

regexreplacefindeditor

10

在文本编辑器中，我想用包含给定单词的行号替换该单词。使用正则表达式是否可以实现？

- Omar

有什么文本编辑器吗？其中一些根本不支持正则表达式。 - choroba

简短回答：不可以。正则表达式不会替换或执行任何其他操作，它们只是定义一个模式。 - ikegami

5

StackOverflow 是用来问编程问题的。关于如何使用编辑器的问题应该在 Super User 上询问。 - ikegami

你是指像Vim这样的文本编辑器吗？ - undefined

4个回答

2

因为你没有指定使用哪个文本编辑器，如果是vim的话，应该这样做：:%s/searched_word/\=printf('%-4d', line('.'))/g (阅读更多)。但正如其他人提到的，这不是一个适合在SO上询问的问题，而是适合在Super User上询问的问题 ;)

- Maciek

1

除非扩展一个允许任意扩展的编辑器，否则我不知道有哪个编辑器可以做到这一点。

不过，你可以很容易地使用 perl 来完成这个任务。

perl -i.bak -e"s/word/$./eg" file

或者如果你想使用通配符，

perl -MFile::DosGlob=glob -i.bak -e"BEGIN { @ARGV = map glob($_), @ARGV } s/word/$./eg" *.txt

- ikegami

@Miller，因为原帖作者使用的是Windows操作系统。 - ikegami

两个问题。你为什么怀疑/知道OP在使用Windows，是基于之前的问题吗？如果OP确实在使用Windows，那么你认为File::DosGlob有哪些特性是他们可能需要的？我从来没有需要超出*.ext的东西，普通的glob可以很好地处理它，但我怀疑我不知道dos glob的附加功能是什么。 - Miller

@Miller，是的，OP连续发布了两个相关问题，另一个涉及UltraEdit。 // “普通”glob与Windows glob非常不同。它在常见表达式中会失败。 DosGlob glob无法做到内置的那个不能做到的事情。它只是以不同的方式实现。 - ikegami

@ikegami，谢谢！我会更新的。 - undefined

为了给OP提供另一个选项（而不覆盖原始文件），在我的手中，代码~$ perl -pe 's/word/$./g;' file > tmp有效。我猜这些答案在WSL的使用越来越普遍时变得更加重要：learn.microsoft.com/en-us/windows/wsl/about - undefined

0

使用Raku（以前称为Perl_6）

用于将包含目标单词的整行替换为行号：

~$ raku -ne 'state $i; ++$i; put m/word/ ?? $i !! $_;'  file

用行号（全局替换）替换目标word的每个实例：

~$ raku -pe 'state $i; ++$i; s:g/word/{$i}/;' file

这个答案是为了补充@ikegami发布的优秀的Perl答案而提供的。Raku和Perl一样，是跨平台的。上面的答案适用于Unix/Linux系统。在Windows系统中，请使用双引号而不是单引号（尽管根据@ikegami的说法，WSL使用单引号。谢谢！）。

第一个代码示例的解读如下：使用-ne非自动打印的逐行标志，声明一个计数器变量$i。使用++i递增变量。使用Raku的三元运算符Test ?? True !! False，如果找到与word匹配的内容，则输出（即put）递增的$i变量，否则输出原始行$_。

第二个代码示例的解读如下：使用-pe自动打印每行标志，声明一个计数变量$i。使用++i递增变量。使用Raku的s:g///全局替换操作符，将每个匹配到的word替换为计数器$i。

示例输入：

my cat
dog
my pig
my cow
my mouse
my pig also

替换全局的示例输出（上面的第二个代码示例）：

使用 pig 替换：

my cat
dog
my 3
my cow
my mouse
my 6 also

注意：后增量可以使用$++来完成，这将以0为起始索引而不是1。正则表达式匹配器实际上可以写成/ … /，即不使用m，如果使用斜杠，甚至可以写成m{ … }，如果你想在正则表达式中匹配斜杠。

此外，除了:g全局之外，您还可以为m/ … /或s///匹配器添加一些正则表达式的"副词"，其中最有用的可能是:i用于不区分大小写的匹配，如下所示： m:i/ … /; 或 s:i:g/…/…/; 有关正则表达式副词的更多信息，请参见底部。

最后，如果你对Vim相当熟悉，你可以在命令行中使用~$ vim file打开文件，然后进入命令行模式，使用:冒号。一旦进入Vim命令行，输入%! raku -pe 'state $i; ++$i; s:g/pig/{$i}/;'来运行Raku命令。根据需要，可以保存到新文件或覆盖原始文件。

https://docs.raku.org/language/regexes
https://raku.org

- jubilatious1

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- zx81 · Accepted Answer

递归、自引用组（Qtax技巧）、反向Qtax或平衡组

介绍

在输入底部添加一个整数列表的想法类似于一个著名的数据库黑客（与正则表达式无关），其中一个加入到整数表中。我的原始答案使用了@Qtax技巧。当前的答案使用递归、Qtax技巧（直接或反向变化）或平衡组。

是的，这是可能的……但需要一些警告和正则表达式技巧。

本答案中的解决方案旨在演示一些正则表达式语法，而不是实际要实现的实用答案。
在文件末尾，我们将粘贴一串数字列表，前面加上唯一的分隔符。对于这个实验，追加的字符串是:1:2:3:4:5:6:7，这是一种类似于使用整数表的著名数据库黑客技巧。
对于前两个解决方案，我们需要一个支持递归（解决方案1）或自引用捕获组（解决方案2和3）的正则表达式版本的编辑器。Notepad++和EditPad Pro是两个可以想到的。对于第三个解决方案，我们需要一个支持平衡组的编辑器。这可能将我们限制在EditPad Pro或Visual Studio 2013+上。

输入文件：

假设我们正在搜索pig并希望用行号替换它。

我们将使用以下内容作为输入：

my cat
dog
my pig
my cow
my mouse

:1:2:3:4:5:6:7

第一个解决方案：递归

支持的语言：除了上述文本编辑器（Notepad++和EditPad Pro）之外，此解决方案应在使用PCRE（PHP、R、Delphi）、Perl以及使用Matthew Barnett的regex模块的Python中工作（未经测试）。

递归结构位于前瞻中，是可选的。它的作用是平衡左侧不包含pig的行与右侧的数字：可以将其视为平衡嵌套结构，例如{{{ }}}... 除了左侧是无匹配行，右侧是数字。关键在于当我们退出前瞻时，我们知道跳过了多少行。

搜索：

(?sm)(?=.*?pig)(?=((?:^(?:(?!pig)[^\r\n])*(?:\r?\n))(?:(?1)|[^:]+)(:\d+))?).*?\Kpig(?=.*?(?(2)\2):(\d+))

带注释的自由间距版本：

(?xsm)             # free-spacing mode, multi-line
(?=.*?pig)        # fail right away if pig isn't there

(?=               # The Recursive Structure Lives In This Lookahead
(                 # Group 1
   (?:               # skip one line 
      ^              
      (?:(?!pig)[^\r\n])*  # zero or more chars not followed by pig
      (?:\r?\n)      # newline chars
    ) 
    (?:(?1)|[^:]+)   # recurse Group 1 OR match all chars that are not a :
    (:\d+)           # match digits
)?                 # End Group 
)                 # End lookahead. 
.*?\Kpig                # get to pig
(?=.*?(?(2)\2):(\d+))   # Lookahead: capture the next digits

替换：\3

在演示中，可以看到底部的替换。您可以在前两行上玩弄字母（删除一个空格以制作pig），将pig的第一次出现移动到另一行，并查看其如何影响结果。

第二种解决方案：自引用组（“Qtax技巧”）

支持的语言：除了上面提到的文本编辑器（Notepad++和EditPad Pro）外，此解决方案应该适用于使用PCRE（PHP、R、Delphi）的语言，在Perl中以及使用Matthew Barnett的regex模块的Python中（未经测试）。该解决方案易于通过将\K转换为前瞻并将占有量词转换为原子组来适应.NET（请参见下面几行的.NET版本）。

搜索：

(?sm)(?=.*?pig)(?:(?:^(?:(?!pig)[^\r\n])*(?:\r?\n))(?=[^:]+((?(1)\1):\d+)))*+.*?\Kpig(?=[^:]+(?(1)\1):(\d+))

.NET版本：回到未来

.NET没有\K，我们使用一种“回到未来”的正向先行断言（包含一个跳过匹配的前瞻），在它的位置上。此外，我们需要使用原子组而不是贪婪量词。

(?sm)(?<=(?=.*?pig)(?=(?>(?:^(?:(?!pig)[^\r\n])*(?:\r?\n))(?=[^:]+((?(1)\1):\d+)))*).*)pig(?=[^:]+(?(1)\1):(\d+))

带有注释的自由空间版本（Perl/PCRE版本）：

(?xsm)             # free-spacing mode, multi-line
(?=.*?pig)        # lookahead: if pig is not there, fail right away to save the effort
(?:               # start counter-line-skipper (lines that don't include pig)
   (?:               # skip one line 
      ^              # 
      (?:(?!pig)[^\r\n])*  # zero or more chars not followed by pig
      (?:\r?\n)      # newline chars
    )   
   # for each line skipped, let Group 1 match an ever increasing portion of the numbers string at the bottom
   (?=             # lookahead
      [^:]+           # skip all chars that are not colons
      (               # start Group 1
        (?(1)\1)      # match Group 1 if set
        :\d+          # match a colon and some digits
      )               # end Group 1
   )               # end lookahead
)*+               # end counter-line-skipper: zero or more times
.*?               # match
\K                # drop everything we've matched so far
pig               # match pig (this is the match!)
(?=[^:]+(?(1)\1):(\d+))   # capture the next number to Group 2

替换：

\2

输出：

my cat
dog
my 3
my cow
my mouse

:1:2:3:4:5:6:7

在演示中，请查看底部的替换。您可以在前两行字母上玩耍（删除一个空格以制作pig），将pig的第一次出现移动到另一行，并查看它如何影响结果。

数字分隔符的选择

在我们的示例中，数字字符串的分隔符:相当常见，可能会发生在其他地方。我们可以发明一个UNIQUE_DELIMITER并稍微调整表达式。但是以下优化更有效，让我们保持: 第二个解决方案的优化：数字字符串反转

不是按顺序粘贴数字，而是倒序使用数字：:7:6:5:4:3:2:1 在我们的预测中，这使我们能够通过简单的.*到达输入的底部，并从那里开始回溯。由于我们知道我们在字符串的末尾，因此我们不必担心:digits成为字符串的另一部分。以下是如何做到这一点。

输入:

my cat pi g
dog p ig
my pig
my cow
my mouse

:7:6:5:4:3:2:1

搜索：

(?xsm)             # free-spacing mode, multi-line
(?=.*?pig)        # lookahead: if pig is not there, fail right away to save the effort
(?:               # start counter-line-skipper (lines that don't include pig)
   (?:               # skip one line that doesn't have pig
      ^              # 
      (?:(?!pig)[^\r\n])*  # zero or more chars not followed by pig
      (?:\r?\n)      # newline chars
    )   
   # Group 1 matches increasing portion of the numbers string at the bottom
   (?=             # lookahead
      .*           # get to the end of the input
      (               # start Group 1
        :\d+          # match a colon and some digits
        (?(1)\1)      # match Group 1 if set
      )               # end Group 1
   )               # end lookahead
)*+               # end counter-line-skipper: zero or more times
.*?               # match
\K                # drop match so far
pig               # match pig (this is the match!)
(?=.*(\d+)(?(1)\1))   # capture the next number to Group 2

替换： \2

请查看演示中的替换。

第三种解决方案：平衡组

此解决方案仅适用于.NET。

搜索：

(?m)(?<=\A(?<c>^(?:(?!pig)[^\r\n])*(?:\r?\n))*.*?)pig(?=[^:]+(?(c)(?<-c>:\d+)*):(\d+))

带注释的自由间隔版本：

(?xm)                # free-spacing, multi-line
(?<=                 # lookbehind
   \A                # 
   (?<c>               # skip one line that doesn't have pig
                       # The length of Group c Captures will serve as a counter
     ^                    # beginning of line
     (?:(?!pig)[^\r\n])*  # zero or more chars not followed by pig
     (?:\r?\n)            # newline chars
   )                   # end skipper
   *                   # repeat skipper
   .*?                 # we're on the pig line: lazily match chars before pig
   )                # end lookbehind
pig                 # match pig: this is the match
(?=                 # lookahead
   [^:]+               # get to the digits
   (?(c)               # if Group c has been set
     (?<-c>:\d+)         # decrement c while we match a group of digits
     *                   # repeat: this will only repeat as long as the length of Group c captures > 0 
   )                   # end if Group c has been set
   :(\d+)              # Match the next digit group, capture the digits
)                    # end lokahead

替换为: $1

正则表达式能返回匹配的行号吗？

递归、自引用组（Qtax技巧）、反向Qtax或平衡组

第一个解决方案：递归

第二种解决方案：自引用组（“Qtax技巧”）

第三种解决方案：平衡组

参考资料