我想仅在未引用的词语中将|
替换为OR
,例如:
"this | that" | "the | other" -> "this | that" OR "the | other"
是的,我可以按空格或引号分割字符串,获取数组并遍历它,然后重构字符串,但这似乎不够优雅。所以也许有一种正则表达式的方法来计算"
在|
之前出现的次数,奇数意味着|
被引用,偶数意味着未引用。(注意:如果至少有一个"
,则处理直到有偶数个"
才开始)。
我想仅在未引用的词语中将|
替换为OR
,例如:
"this | that" | "the | other" -> "this | that" OR "the | other"
是的,我可以按空格或引号分割字符串,获取数组并遍历它,然后重构字符串,但这似乎不够优雅。所以也许有一种正则表达式的方法来计算"
在|
之前出现的次数,奇数意味着|
被引用,偶数意味着未引用。(注意:如果至少有一个"
,则处理直到有偶数个"
才开始)。
虽然正则表达式不能计数,但它们可以用来确定某个东西的数量是奇数还是偶数。在这种情况下的技巧是检查管道符号之后的引号,而不是之前的引号。
str = str.replace(/\|(?=(?:(?:[^"]*"){2})*[^"]*$)/g, "OR");
将其分解,(?:[^"]*"){2}
匹配下一对引号(如果有的话),以及中间的非引号部分。在尽可能多次执行此操作之后(可能为零),[^"]*$
消耗任何剩余的非引号字符,直到字符串的结尾。
当然,这假设文本格式良好。它也没有解决转义引号的问题,但如果需要,它可以解决。
您可能会发现这个问题的Perl FAQ很相关。
#!/usr/bin/perl
use strict;
use warnings;
my $x = qq{"this | that" | "the | other"};
print join('" OR "', split /" \| "/, $x), "\n";
你不需要计数,因为你没有嵌套引号。这样就可以了:
#!/usr/bin/perl
my $str = '" this \" | that" | "the | other" | "still | something | else"';
print "$str\n";
while($str =~ /^((?:[^"|\\]*|\\.|"(?:[^\\"]|\\.)*")*)\|/) {
$str =~ s/^((?:[^"|\\]*|\\.|"(?:[^\\"]|\\.)*")*)\|/$1OR/;
}
print "$str\n";
现在,让我们解释一下这个表达式。
^ -- means you'll always match everything from the beginning of the string, otherwise
the match might start inside a quote, and break everything
(...)\| -- this means you'll match a certain pattern, followed by a |, which appears
escaped here; so when you replace it with $1OR, you keep everything, but
replace the |.
(?:...)* -- This is a non-matching group, which can be repeated multiple times; we
use a group here so we can repeat multiple times alternative patterns.
[^"|\\]* -- This is the first pattern. Anything that isn't a pipe, an escape character
or a quote.
\\. -- This is the second pattern. Basically, an escape character and anything
that follows it.
"(?:...)*" -- This is the third pattern. Open quote, followed by a another
non-matching group repeated multiple times, followed by a closing
quote.
[^\\"] -- This is the first pattern in the second non-matching group. It's anything
except an escape character or a quote.
\\. -- This is the second pattern in the second non-matching group. It's an
escape character and whatever follows it.
结果如下:
" this \" | that" | "the | other" | "still | something | else"
" this \" | that" OR "the | other" OR "still | something | else"
另一种方法(类似于Alan M的可行方案):
str = str.replace(/(".+?"|\w+)\s*\|\s*/g, '$1 OR ');
第一个组中的部分(为了易读性而留有空格):
".+?" | \w+
...基本上意味着引用的内容或单词。其余部分表示后面跟着一个可选空格包裹的“|”。替换是第一部分(“$1”表示第一组)后跟着“ OR ”。
也许你正在寻找类似这样的东西:
(?<=^([^"]*"[^"]*")+[^"|]*)\|
谢谢大家。抱歉我忘记提到这是JavaScript,术语不需要加引号,可以有任意数量的带引号/不带引号的术语,例如:
"this | that" | "the | other" | yet | another -> "this | that" OR "the | other" OR yet OR another
@Alan M,非常好用,由于SQLite FTS功能的稀疏性,不需要转义。
@epost,为简洁和优雅而接受的解决方案,谢谢。只需要将其以更一般的形式适用于Unicode等即可。
(".+?"|[^\"\s]+)\s*\|\s*
// Count the number of quotes.
var quotesOnly = Regex.Replace(searchText, @"[^""]", string.Empty);
var quoteCount = quotesOnly.Length;
if (quoteCount > 0)
{
// If the quote count is an odd number there's a missing quote.
// Assume a quote is missing from the end - executive decision.
if (quoteCount%2 == 1)
{
searchText += @"""";
}
// Get the matching groups of strings. Exclude the quotes themselves.
// e.g. The following line:
// "this and that" or then and "this or other"
// will result in the following groups:
// 1. "this and that"
// 2. "or"
// 3. "then"
// 4. "and"
// 5. "this or other"
var matches = Regex.Matches(searchText, @"([^\""]*)", RegexOptions.Singleline);
var list = new List<string>();
foreach (var match in matches.Cast<Match>())
{
var value = match.Groups[0].Value.Trim();
if (!string.IsNullOrEmpty(value))
{
list.Add(value);
}
}
// TODO: Do something with the list of strings.
}