正则表达式：删除包含花括号的段落

Question

正则表达式：删除包含花括号的段落

3

我希望删除任何正文段落中有花括号的内容。

例如，从这篇文章中：

<p>While orthotic inserts are able to provide great support and pain relief, they aren’t quite as good as a specialty shoe. Remember that an ill-fitting insert can cause permanent damage and talk to a podiatrist about your foot pain for the best recommendation. Click here&nbsp;if you want to learn more about pain in the foot arch unrelated to plantar fasciitis.</p> <h2>Related Posts</h2> <h2>So What Are These Socks Really Good For?</h2> <h2>Are the bottom of your feet causing you problems?</h2> <h2>A PF Relief Guide</h2> <h2>What is Foot Reflexology &amp; What is it Good For?</h2> <h2>Leave a Reply Cancel reply</h2> <p>Your email address will not be published. Required fields are marked *</p> <p>Name</p> <p>Email</p> <p>Website</p> <p>five &nbsp;−&nbsp; &nbsp;=&nbsp; 2 .hide-if-no-js { display: none !important; } </p><h2>Food For Thought January 2016</h2> <h2>Show Us Some Social Love!!</h2> <h2>Recent Posts</h2> <li> The Climate Pledge of Resistance</li> <li> Green Activism in Boulder, Colorado</li> <li> The Truth About Money and Happiness</li> <li> Why Is There So Much Skepticism About Climate Change?</li> <li> Which Device Would Work Best For You?</li>

我想删除这部分内容：

<p>five &nbsp;−&nbsp; &nbsp;=&nbsp; 2 .hide-if-no-js { display: none !important; } </p>

使用以下正则表达式：.*?\{.*?\}.*?，会将整篇文章都移除，而不是只移除包含花括号的段落，这是一个奇怪的问题...

请问我在正则表达式代码上有什么问题吗？谢谢！

- Andrey Kurnikovs

3

你使用的是什么语言？在Java中，我会匹配所有的标签，然后检查每一个标签是否存在大括号。 - Tim Biegeleisen

我不确定这是否导致整篇文章被删除，但您应该在结束段落标签中转义斜杠。<\/p>，因为它可能会被解释为分隔符。 - Michael

@TimBiegeleisen：在PHP中确实实现了这个 - 请参见下面的答案 :) - Jan

4个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Gate Holloman · Answer 1

我建议采用两步方法（解析和分析文本节点）。下面您将找到Python和PHP的示例（显然可以适用于其他语言）：

- Máté Juhász · Answer 2

0

懒惰/贪婪量词并不总是按预期工作，它们匹配的是除了<之外的字符串，而这个方法对我有效：[^<]*\{[^<]*

- Máté Juhász

@Shafizadeh: "需要创建一个捕获组并用$1替换它，以删除和" - 他想要删除所有东西，而不仅仅是和。 - Máté Juhász

@Michael：感谢您的评论，但是根据不同的情况，它可以在不进行转义的情况下工作（我已经尝试过Python ：））。 - Máté Juhász

- Shafizadeh · Answer 3

试试这个：

var str = '<p>While orthotic inserts are able to provide great support and pain relief, they aren’t quite as good as a specialty shoe. Remember that an ill-fitting insert can cause permanent damage and talk to a podiatrist about your foot pain for the best recommendation. Click here&nbsp;if you want to learn more about pain in the foot arch unrelated to plantar fasciitis.</p> <h2>Related Posts</h2> <h2>So What Are These Socks Really Good For?</h2> <h2>Are the bottom of your feet causing you problems?</h2> <h2>A PF Relief Guide</h2> <h2>What is Foot Reflexology &amp; What is it Good For?</h2> <h2>Leave a Reply Cancel reply</h2> <p>Your email address will not be published. Required fields are marked *</p> <p>Name</p> <p>Email</p> <p>Website</p> <p>five &nbsp;−&nbsp; &nbsp;=&nbsp; 2 .hide-if-no-js { display: none !important; } </p><h2>Food For Thought January 2016</h2> <h2>Show Us Some Social Love!!</h2> <h2>Recent Posts</h2> <li> The Climate Pledge of Resistance</li> <li> Green Activism in Boulder, Colorado</li> <li> The Truth About Money and Happiness</li> <li> Why Is There So Much Skepticism About Climate Change?</li> <li> Which Device Would Work Best For You?</li>';
var result = str.replace(/(<p>[^<]*\{.*<\/p>)/, '');
console.log(result);

正则表达式演示

- Jan · Answer 4

我建议采用两步方法（解析和分析文本节点）。下面你会找到Python和PHP的示例（显然可以适用于其他语言）：

Python:

# -*- coding: utf-8> -*-
import re
from bs4 import BeautifulSoup

html = """
<html>
    <p>While orthotic inserts are able to provide great support and pain relief, they aren’t quite as good as a specialty shoe. Remember that an ill-fitting insert can cause permanent damage and talk to a podiatrist about your foot pain for the best recommendation. Click here&nbsp;if you want to learn more about pain in the foot arch unrelated to plantar fasciitis.</p> <h2>Related Posts</h2> <h2>So What Are These Socks Really Good For?</h2> <h2>Are the bottom of your feet causing you problems?</h2> <h2>A PF Relief Guide</h2> <h2>What is Foot Reflexology &amp; What is it Good For?</h2> <h2>Leave a Reply Cancel reply</h2> <p>Your email address will not be published. Required fields are marked *</p> <p>Name</p> <p>Email</p> <p>Website</p> <p>five &nbsp;−&nbsp; &nbsp;=&nbsp; 2 .hide-if-no-js { display: none !important; } </p><h2>Food For Thought January 2016</h2> <h2>Show Us Some Social Love!!</h2> <h2>Recent Posts</h2> <li> The Climate Pledge of Resistance</li> <li> Green Activism in Boulder, Colorado</li> <li> The Truth About Money and Happiness</li> <li> Why Is There So Much Skepticism About Climate Change?</li> <li> Which Device Would Work Best For You?</li>
</html>
"""

soup = BeautifulSoup(html, 'lxml')
regex = r'{[^}]+}'
for p in soup.find_all('p', string=re.compile(regex)):
    p.replaceWith('')

print soup

PHP：

<?php
$html = "<html>
            <p>While orthotic inserts are able to provide great support and pain relief, they aren’t quite as good as a specialty shoe. Remember that an ill-fitting insert can cause permanent damage and talk to a podiatrist about your foot pain for the best recommendation. Click here&nbsp;if you want to learn more about pain in the foot arch unrelated to plantar fasciitis.</p> <h2>Related Posts</h2> <h2>So What Are These Socks Really Good For?</h2> <h2>Are the bottom of your feet causing you problems?</h2> <h2>A PF Relief Guide</h2> <h2>What is Foot Reflexology &amp; What is it Good For?</h2> <h2>Leave a Reply Cancel reply</h2> <p>Your email address will not be published. Required fields are marked *</p> <p>Name</p> <p>Email</p> <p>Website</p> <p>five &nbsp;−&nbsp; &nbsp;=&nbsp; 2 .hide-if-no-js { display: none !important; } </p><h2>Food For Thought January 2016</h2> <h2>Show Us Some Social Love!!</h2> <h2>Recent Posts</h2> <li> The Climate Pledge of Resistance</li> <li> Green Activism in Boulder, Colorado</li> <li> The Truth About Money and Happiness</li> <li> Why Is There So Much Skepticism About Climate Change?</li> <li> Which Device Would Work Best For You?</li>
        </html>";

$html = str_replace('&nbsp;', ' ', $html); // only because of the &nbsp;
$xml = simplexml_load_string($html);

# look for p tags
$lines = $xml->xpath("//p");

# the actual regex - match anything between curly brackets
$regex = '~{[^}]+}~';

for ($i=0;$i<count($lines);$i++) {
    if (preg_match($regex, $lines[$i]->__toString())) {
        # unset it if it matches
        unset($lines[$i][0]); 
    }
}
// vanished without a sight...
print_r($xml);

// convert it back to a string
$html = echo $xml->asXML();
?>

正则表达式：删除包含花括号的<p></p>段落

正则表达式演示

Python:

PHP：