将 preg_replace 函数用于将 href 锚点替换为锚文本

Question

将 preg_replace 函数用于将 href 锚点替换为锚文本

3

如何将所有锚点替换为每个锚点文本。我的代码是

$body='<p>The man was <a href="http://www.example.com/video/">dancing like a little boy</a> while all kids were watching ... </p>';

i want the result to be :

<p>The man was dancing like a little boy while all kids were watching ... </p>

我使用了：

$body= preg_replace('#<a href="https?://(?:.+\.)?ok.co.*?>.*?</a>#i', '$1', $body);

并且结果是：

<p>The man was while all kids were watching ... </p>

- khalil

省略号 "..." 应该在哪个时候出现？是在一定数量的词或字符之后？ - Object Manipulator

正文字符串包含许多锚点，我想循环遍历它们，检查确切的“www.example.com”而不是子域名，并用其文本替换每个锚点。谢谢。 - khalil

@khalil 请尝试使用我提供的下面的答案。这将解决您的问题。 - Manish

你确定你要走正则表达式的路线，而不是使用大量可靠解析DOM的库吗？ - Wrikken

4个回答

2

没有正则表达式......

<?php

$d = new DOMDocument();
$d->loadHTML('<p>The man was <a href="http://www.example.com/video/">dancing like a little boy</a> while all kids were watching ... </p>');
$x = new DOMXPath($d);
foreach($x->query('//a') as $anchor){
    $url = $anchor->getAttribute('href');
    $domain = parse_url($url,PHP_URL_HOST);
    if($domain == 'www.example.com'){
        $anchor->parentNode->replaceChild(new DOMText($anchor->textContent),$anchor);
    }
}

function get_inner_html( $node ) {
    $innerHTML= '';
    $children = $node->childNodes;
    foreach ($children as $child) {
        $innerHTML .= $child->ownerDocument->saveXML( $child );
    }
    return $innerHTML;
}
echo get_inner_html($x->query('//body')[0]);

- Wrikken

为什么要使用XPath，而$d->getElementsByTagName('a')也可以实现相同的功能。但是如果您注册一个函数来检查域名，并且在查询中只选择具有此域的链接节点，则XPath可能会更有趣：http://php.net/manual/en/domxpath.registerphpfunctions.php - Casimir et Hippolyte

你当然也可以使用这个，我只是需要一个快速的非正则表达式示例，其中 XPath 是我的默认选择。但是对于 href，你不一定需要 PHP 函数：//a[starts-with(@href,'http://www.example.com')] 对于 OP 也可能适用，具体取决于是否期望其他替代方案，如 https://www.example.com 或 //www.example.com。 - Wrikken

1

你可以在这里简单地使用 strip_tags() 和 htmlspecialchars()。

strip_tags - 从字符串中删除 HTML 和 PHP 标记。 htmlspecialchars - 将特殊字符转换为 HTML 实体。

步骤1：使用 strip_tags() 去除所有标签，除了

标签。步骤2：由于我们需要获取带有 HTML 标签的字符串，因此需要使用 htmlspecialchars()。

echo htmlspecialchars(strip_tags($body, '<p>'));

当已经有内置的PHP函数时，我认为使用该函数而不是使用preg_replace更好且更紧凑。

- Object Manipulator

抱歉，我应该提到我不想替换 href 中的子域名，它应该完全匹配，如果 href 包含“www.example.com”，则不要替换任何其他域或子域。 - khalil

1

可以使用这段代码：

正则表达式：/<a.*?>|<a.*?>|<\/a>/g

$body='<p>The man was <a href="http://www.example.com/video/">dancing like a little boy</a> while all kids were watching ... </p>';

echo preg_replace('/< a.*?>|<a.*?>|<\/a>/', ' ', $body);

测试并展示匹配单词的示例：https://regex101.com/r/mgYjoB/1

- Farhang Negari

标签名字前面不应该有任何空格，因此 < a 不是有效的。同样地，你可以将你的正则表达式缩短为 **<\/?a\b[^<>]*>**。 - revo

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Manish · Accepted Answer

试试这个

$body='<p>The man was <a href="http://www.example.com/video/">dancing like a little boy</a> while all kids were watching ... </p>';

    echo preg_replace('#<a.*?>([^>]*)</a>#i', '$1', $body);