忽略HTML标签替换文本

3

我有一段带有HTML标签的简单文本,例如:

Once <u>the</u> activity <a href="#">reaches</a> the resumed state, you can freely add and remove fragments to the activity. Thus, <i>only</i> while the activity is in the resumed state can the <b>lifecycle</b> of a <hr/> fragment change independently.

当我替换文本内容时,需要忽略其中的HTML标签,例如这段字符串:Thus, <i>only</i> while ,我需要替换成:Hello, <i>its only</i> while,待替换的文本和字符串是动态的。我需要你帮我构建preg_replace模式。

$text = '<b>Some html</b> tags with <u>and</u> there are a lot of tags <i>in</i> this text';

$arrayKeys= array('Some html' => 'My html', 'and there' => 'is there', 'in this text' => 'in this code');

foreach ($arrayKeys as $key => $value)
    $text = preg_replace('...$key...', '...$value...', $text);

echo $text; // output should be: <b>My html</b> tags with <u>is</u> there are a lot of tags <i>in</i> this code';

请帮我找到解决方案。谢谢。


1
根据提供的示例,我不认为正则表达式可以满足您想要的要求,因为您没有具体的规则集,似乎您的要求随着每个不同的示例而变化。 - qJake
好的,正则表达式不行...也许还有其他工具?...问题是用户(网站管理员)输入要替换的数据,数组是动态的。 - pleerock
可能做不到,除非您能澄清字符串是“<b>hello</b> world <i>again</i>”,并且我想用“hi there from earth”替换“hello world again”。输出是什么? - iWantSimpleLife
1
嗯,我认为如果字符串“<b>hello</b> world <i>again</i>”被替换为“来自地球的问候”(不移动标签),那对我来说就可以了。 - pleerock
2个回答

1
基本上,我们将使用正则表达式从普通文本构建匹配和模式的动态数组。此代码仅匹配最初要求的内容,但您应该能够从我详细解释的方式中了解如何编辑代码。我们捕获开放或关闭标记以及空格作为传递变量,并替换其周围的文本。这是基于两个和三个单词组合的设置。
<?php

    $text = '<b>Some html</b> tags with <u>and</u> there are a lot of tags <i>in</i> this text';

    $arrayKeys= array(
    'Some html' => 'My html',
    'and there' => 'is there',
    'in this text' =>'in this code');


    function make_pattern($string){
        $patterns = array(
                      '!(\w+)!i',
                      '#^#',
                      '! !',
                      '#$#');
        $replacements = array(
                      "($1)",
                      '!',
                //This next line is where we capture the possible tag or
                //whitespace so we can ignore it and pass it through.
                      '(\s?<?/?[^>]*>?\s?)',
                      '!i');
        $new_string = preg_replace($patterns,$replacements,$string);
        return $new_string;
    }

    function make_replacement($replacement){
        $patterns = array(
                      '!^(\w+)(\s+)(\w+)(\s+)(\w+)$!',
                      '!^(\w+)(\s+)(\w+)$!');
        $replacements = array(
                       '$1\$2$3\$4$5',
                       '$1\$2$3');
        $new_replacement = preg_replace($patterns,$replacements,$replacement);
        return $new_replacement;
    }


    foreach ($arrayKeys as $key => $value){
        $new_Patterns[] = make_pattern($key);
        $new_Replacements[] = make_replacement($value);
    }

    //For debugging
    //print_r($new_Patterns);
    //print_r($new_Replacements);

    $new_text = preg_replace($new_Patterns,$new_Replacements,$text);

    echo $new_text."\n";
    echo $text;


?>

输出

<b>My html</b> tags with <u>is</u> there are a lot of tags <i>in</i> this code
<b>Some html</b> tags with <u>and</u> there are a lot of tags <i>in</i> this text

0

我们开始吧。这段代码应该可以工作,只要你遵守以下两个限制:

  • 模式和替换必须具有相同数量的单词。(逻辑上讲,因为你想保留位置)
  • 您不能在标记周围拆分单词。 (<b>Hel</b>lo World 将无法工作。)

但是如果遵守了这些限制,这应该可以正常工作!

<?php
    // Splits a string in parts delimited with the sequence.
    // '<b>Hey</b> you' becomes '~-=<b>~-=Hey~-=</b>~-= you' that make us get
    // array ("<b>", "Hey" " you")
    function getTextArray ($text, $special) {
        $text = preg_replace ('#(<.*>)#isU', $special . '$1' . $special, $text); // Adding spaces to make explode work fine.

        return preg_split ('#' . $special . '#', $text, -1, PREG_SPLIT_NO_EMPTY);
    }
        $text = "
    <html>
        <div>
            <p>
                <b>Hey</b> you ! No, you don't have <em>to</em> go!
            </p>
        </div>
    </html>";

    $replacement = array (
        "Hey you" => "Bye me",
        "have to" => "need to",
        "to go" => "to run");

    // This is a special sequence that you must be sure to find nowhere in your code. It is used to split sequences, and will disappear.
    $special = '~-=';

    $text_array = getTextArray ($text, $special);

    // $restore is the array that will finally contain the result.
    // Now we're only storing the tags.
    // We'll be story the text later.
    //
    // $clean_text is the text without the tags, but with the special sequence instead.
    $restore = array ();
    for ($i = 0; $i < sizeof ($text_array); $i++) {
        $str = $text_array[$i];

        if (preg_match('#<.+>#', $str)) {
            $restore[$i] = $str;
            $clean_text .= $special;
        }

        else {
            $clean_text .= $str;
        }
    }

    // Here comes the tricky part.
    // We wanna keep the position of each part of the text so the tags don't
    // move after.
    // So we're making the regex look like (~-=)*Hey(~-=)* you(~-=)*
    // And the replacement look like $1Bye$2 me $3.
    // So that we keep the separators at the right place.
    foreach ($replacement as $regex => $newstr) {
        $regex_array = explode (' ', $regex);
        $regex = '(' . $special . '*)' . implode ('(' . $special . '*) ', $regex_array) . '(' . $special . '*)';

        $newstr_array = explode (' ', $newstr);
        $newstr = "$1";

        for ($i = 0; $i < count ($regex_array) - 1; $i++) {
            $newstr .= $newstr_array[$i] . '$' . ($i + 2) . ' ';
        }
        $newstr .= $newstr_array[count($regex_array) - 1] . '$' . (count ($regex_array) + 1);

        $clean_text = preg_replace ('#' . $regex . '#isU', $newstr, $clean_text);
    }

    // Here we re-split one last time.
    $clean_text_array = preg_split ('#' . $special . '#', $clean_text, -1, PREG_SPLIT_NO_EMPTY);

    // And we merge with $restore.
    for ($i = 0, $j = 0; $i < count ($text_array); $i++) {
        if (!isset($restore[$i])) {
            $restore[$i] = $clean_text_array[$j];
            $j++;
        }
    }

    // Now we reorder everything, and make it go back to a string.
    ksort ($restore);
    $result = implode ($restore);

    echo $result;
?>

将输出Bye me!不,你不需要运行

[编辑]现在支持自定义模式,可以避免添加无用的空格。


我看到了全局变量和用于HTML的正则表达式。因此我投了反对票。用于HTML的正则表达式几乎总是会出问题,这个也不例外。 - Madara's Ghost
嗯,问题标签是关于什么的?一个实践不被建议并不意味着它不能被实现。 - Jerska
1
既然我们正在进行这个辩论,PHP在很多方面都是一种糟糕的语言,但它的某些功能让我爱不释手。根据您的看法,我应该放弃使用PHP编程吗? - Jerska
1
并非如此,但由于您未提及(甚至使用)其中任何一种,我认为您的回答并不有帮助。没有语言是完美的,但您本可以在不使用全局变量的情况下完成操作。如果问题涉及HTML的preg_replace,您可以回答,并注明这可能不是最佳操作方法。请勿误解,我相信您的回答是有效的,但我不认为鼓励不良实践的回答是有帮助的 - Madara's Ghost

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接