忽略HTML标签替换文本

Question

忽略HTML标签替换文本

3

我有一段带有HTML标签的简单文本，例如：

Once <u>the</u> activity <a href="#">reaches</a> the resumed state, you can freely add and remove fragments to the activity. Thus, <i>only</i> while the activity is in the resumed state can the <b>lifecycle</b> of a <hr/> fragment change independently.

当我替换文本内容时，需要忽略其中的HTML标签，例如这段字符串：Thus, only while ，我需要替换成：Hello, its only while，待替换的文本和字符串是动态的。我需要你帮我构建preg_replace模式。

$text = '<b>Some html</b> tags with <u>and</u> there are a lot of tags <i>in</i> this text';

$arrayKeys= array('Some html' => 'My html', 'and there' => 'is there', 'in this text' => 'in this code');

foreach ($arrayKeys as $key => $value)
    $text = preg_replace('...$key...', '...$value...', $text);

echo $text; // output should be: <b>My html</b> tags with <u>is</u> there are a lot of tags <i>in</i> this code';

请帮我找到解决方案。谢谢。

- pleerock

1

根据提供的示例，我不认为正则表达式可以满足您想要的要求，因为您没有具体的规则集，似乎您的要求随着每个不同的示例而变化。 - qJake

好的，正则表达式不行...也许还有其他工具？...问题是用户（网站管理员）输入要替换的数据，数组是动态的。 - pleerock

可能做不到，除非您能澄清字符串是“hello world again”，并且我想用“hi there from earth”替换“hello world again”。输出是什么？ - iWantSimpleLife

1

嗯，我认为如果字符串“hello world again”被替换为“来自地球的问候”（不移动标签），那对我来说就可以了。 - pleerock

2个回答

0

我们开始吧。这段代码应该可以工作，只要你遵守以下两个限制：

模式和替换必须具有相同数量的单词。（逻辑上讲，因为你想保留位置）
您不能在标记周围拆分单词。 (Hello World 将无法工作。)

但是如果遵守了这些限制，这应该可以正常工作！

<?php
    // Splits a string in parts delimited with the sequence.
    // '<b>Hey</b> you' becomes '~-=<b>~-=Hey~-=</b>~-= you' that make us get
    // array ("<b>", "Hey" " you")
    function getTextArray ($text, $special) {
        $text = preg_replace ('#(<.*>)#isU', $special . '$1' . $special, $text); // Adding spaces to make explode work fine.

        return preg_split ('#' . $special . '#', $text, -1, PREG_SPLIT_NO_EMPTY);
    }
        $text = "
    <html>
        <div>
            <p>
                <b>Hey</b> you ! No, you don't have <em>to</em> go!
            </p>
        </div>
    </html>";

    $replacement = array (
        "Hey you" => "Bye me",
        "have to" => "need to",
        "to go" => "to run");

    // This is a special sequence that you must be sure to find nowhere in your code. It is used to split sequences, and will disappear.
    $special = '~-=';

    $text_array = getTextArray ($text, $special);

    // $restore is the array that will finally contain the result.
    // Now we're only storing the tags.
    // We'll be story the text later.
    //
    // $clean_text is the text without the tags, but with the special sequence instead.
    $restore = array ();
    for ($i = 0; $i < sizeof ($text_array); $i++) {
        $str = $text_array[$i];

        if (preg_match('#<.+>#', $str)) {
            $restore[$i] = $str;
            $clean_text .= $special;
        }

        else {
            $clean_text .= $str;
        }
    }

    // Here comes the tricky part.
    // We wanna keep the position of each part of the text so the tags don't
    // move after.
    // So we're making the regex look like (~-=)*Hey(~-=)* you(~-=)*
    // And the replacement look like $1Bye$2 me $3.
    // So that we keep the separators at the right place.
    foreach ($replacement as $regex => $newstr) {
        $regex_array = explode (' ', $regex);
        $regex = '(' . $special . '*)' . implode ('(' . $special . '*) ', $regex_array) . '(' . $special . '*)';

        $newstr_array = explode (' ', $newstr);
        $newstr = "$1";

        for ($i = 0; $i < count ($regex_array) - 1; $i++) {
            $newstr .= $newstr_array[$i] . '$' . ($i + 2) . ' ';
        }
        $newstr .= $newstr_array[count($regex_array) - 1] . '$' . (count ($regex_array) + 1);

        $clean_text = preg_replace ('#' . $regex . '#isU', $newstr, $clean_text);
    }

    // Here we re-split one last time.
    $clean_text_array = preg_split ('#' . $special . '#', $clean_text, -1, PREG_SPLIT_NO_EMPTY);

    // And we merge with $restore.
    for ($i = 0, $j = 0; $i < count ($text_array); $i++) {
        if (!isset($restore[$i])) {
            $restore[$i] = $clean_text_array[$j];
            $j++;
        }
    }

    // Now we reorder everything, and make it go back to a string.
    ksort ($restore);
    $result = implode ($restore);

    echo $result;
?>

将输出Bye me！不，你不需要运行！

[编辑]现在支持自定义模式，可以避免添加无用的空格。

- Jerska

我看到了全局变量和用于HTML的正则表达式。因此我投了反对票。用于HTML的正则表达式几乎总是会出问题，这个也不例外。 - Madara's Ghost

嗯，问题标签是关于什么的？一个实践不被建议并不意味着它不能被实现。 - Jerska

1

既然我们正在进行这个辩论，PHP在很多方面都是一种糟糕的语言，但它的某些功能让我爱不释手。根据您的看法，我应该放弃使用PHP编程吗？ - Jerska

1

并非如此，但由于您未提及（甚至使用）其中任何一种，我认为您的回答并不有帮助。没有语言是完美的，但您本可以在不使用全局变量的情况下完成操作。如果问题涉及HTML的preg_replace，您可以回答，并注明这可能不是最佳操作方法。请勿误解，我相信您的回答是有效的，但我不认为鼓励不良实践的回答是有帮助的。 - Madara's Ghost

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- AbsoluteƵERØ · Accepted Answer

基本上，我们将使用正则表达式从普通文本构建匹配和模式的动态数组。此代码仅匹配最初要求的内容，但您应该能够从我详细解释的方式中了解如何编辑代码。我们捕获开放或关闭标记以及空格作为传递变量，并替换其周围的文本。这是基于两个和三个单词组合的设置。

<?php

    $text = '<b>Some html</b> tags with <u>and</u> there are a lot of tags <i>in</i> this text';

    $arrayKeys= array(
    'Some html' => 'My html',
    'and there' => 'is there',
    'in this text' =>'in this code');


    function make_pattern($string){
        $patterns = array(
                      '!(\w+)!i',
                      '#^#',
                      '! !',
                      '#$#');
        $replacements = array(
                      "($1)",
                      '!',
                //This next line is where we capture the possible tag or
                //whitespace so we can ignore it and pass it through.
                      '(\s?<?/?[^>]*>?\s?)',
                      '!i');
        $new_string = preg_replace($patterns,$replacements,$string);
        return $new_string;
    }

    function make_replacement($replacement){
        $patterns = array(
                      '!^(\w+)(\s+)(\w+)(\s+)(\w+)$!',
                      '!^(\w+)(\s+)(\w+)$!');
        $replacements = array(
                       '$1\$2$3\$4$5',
                       '$1\$2$3');
        $new_replacement = preg_replace($patterns,$replacements,$replacement);
        return $new_replacement;
    }


    foreach ($arrayKeys as $key => $value){
        $new_Patterns[] = make_pattern($key);
        $new_Replacements[] = make_replacement($value);
    }

    //For debugging
    //print_r($new_Patterns);
    //print_r($new_Replacements);

    $new_text = preg_replace($new_Patterns,$new_Replacements,$text);

    echo $new_text."\n";
    echo $text;


?>

输出

<b>My html</b> tags with <u>is</u> there are a lot of tags <i>in</i> this code
<b>Some html</b> tags with <u>and</u> there are a lot of tags <i>in</i> this text