在多行字符串中匹配特定单词之前的所有内容

4

我正在尝试使用正则表达式从字符串中过滤掉一些垃圾文本,但似乎无法使其工作。我不是一个正则表达式专家(甚至离那还很遥远),并且我已经搜索了类似的例子,但没有一个似乎可以解决我的问题。

我需要一个正则表达式,它匹配从字符串开头到特定单词的所有内容,但不包括该单词本身。

以下是一个示例:

<p>This is the string I want to process with as you can see also contains HTML tags like <i>this</i> and <strong>this</strong></p>
<p>I want to remove everything in the string BEFORE the word "giraffe" (but not "giraffe" itself and keep everything after it.</p>

那么,我如何匹配在单词“giraffe”之前的字符串中的所有内容呢?

谢谢!

5个回答

5
resultString = Regex.Replace(subjectString, 
    @"\A             # Start of string
    (?:              # Match...
     (?!""giraffe"") #  (unless we're at the start of the string ""giraffe"")
    .                #  any character (including newlines)
    )*               # zero or more times", 
    "", RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace);

应该能够工作。


4
为什么要使用正则表达式?
String s = "blagiraffe";
s = s.SubString(s.IndexOf("giraffe"));

1

试试这个:

    var s =
         @"<p>This is the string I want to process with as you can see also contains HTML tags like <i>this</i> and <strong>this</strong></p>
         <p>I want to remove everything in the string BEFORE the word ""giraffe"" (but not ""giraffe"" itself and keep everything after it.</p>";
    var ex = new Regex("giraffe.*$", RegexOptions.Multiline);
    Console.WriteLine(ex.Match(s).Value);

这段代码片段会产生以下输出:

giraffe" (but not "giraffe" itself and keep everything after it.</p>

0

使用前瞻可以解决问题:

^.*(?=\s+giraffe)

0
你可以使用具有前瞻的模式,像这样:

^.*?(?=giraffe)


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接