"cats cats cats and dogs dogs dogs and cats cats and dogs dogs"
将返回:
"cats and dogs and cats and dogs"
resultString = Regex.Replace(subjectString, @"\b(\w+)(?:\s+\1\b)+", "$1");
将在一次调用中完成所有替换。
说明:
\b # assert that we are at a word boundary
# (we only want to match whole words)
(\w+) # match one word, capture into backreference #1
(?: # start of non-capturing, repeating group
\s+ # match at least one space
\1 # match the same word as previously captured
\b # as long as we match it completely
)+ # do this at least once
将(\w+)\s+\1
替换为$1
在循环中执行此操作,直到不再找到匹配项。仅设置global
标志是不够的,因为它不会替换cats cats cats
中的第三个cats
在正则表达式中\1
指的是第一个捕获组的内容。
尝试:
str = "cats cats cats and dogs dogs dogs and cats cats and dogs dogs";
str = Regex.Replace(str, @"(\b\w+\b)\s+(\1(\s+|$))+", "$1 ");
Console.WriteLine(str);
str = Regex.Replace(str, @"(\w+)\s+\1", "$1");
。 - Amarghoshstring somestring = "cats cats cats and dogs dogs dogs and cats cats and dogs dogs";
Regex regex = new Regex(@"(\w+)\s(?:\1\s)*(?:\1(\s|$))");
string result = regex.Replace(somestring, "$1$2");
cats cats cats and dogs dogs dogs and cats cats and dogs dogs
变成了 catsand dogsand catsand dogs
。它也匹配了太多:Michael Bolton on CD
变成了 Michael BoltonCD
。对于《办公室》的参考,我们表示抱歉。 - Tim Pietzcker$1$2
替换的,所以我之前认为存在的第一个问题不存在了。但是Michael Bolton仍然有问题。也许一些催眠会有帮助(或者在\w
之前加上单词边界\b
)。 - Tim Pietzcker请尝试以下代码。
using System;
using System.Text.RegularExpressions;
命名空间 ConsoleApplication1
{
/// <summary>
///
/// 正则表达式的描述:
///
/// 匹配表达式但不捕获它。[^|\s+]
/// 从2个备选项中选择
/// 行或字符串的开头
/// 空格,一个或多个重复
/// [1]: 编号捕获组。[(\w+)(?:\s+|$)]
/// (\w+)(?:\s+|$)
/// [2]: 编号捕获组。[\w+]
/// 字母数字字符,一个或多个重复
/// 匹配表达式但不捕获它。[\s+|$]
/// 从2个备选项中选择
/// 空格,一个或多个重复
/// 行或字符串的结尾
/// [3]: 编号捕获组。[\1|\2],一个或多个重复
/// 从2个备选项中选择
/// 回溯到捕获编号:1
/// 回溯到捕获编号:2
///
///
/// </summary>
class Class1
{
///
/// 应用程序的主入口点。
///
static void Main(string[] args)
{
Regex regex = new Regex(
"(?:^|\s+)((\w+)(?:\s+|$))(\1|\2)+",
RegexOptions.IgnoreCase
| RegexOptions.Compiled
);
string str = "cats cats cats and dogs dogs dogs and cats cats and dogs dogs";
string regexReplace = " $1";
}