PHP中嵌套占位符的替换

3
我有带有占位符的字符串,例如 "{variant 1|variant 2}",其中"|"表示"或"; 我想获得所有没有占位符的字符串变体。例如,如果我使用字符串"{a|b{c|d}}",则会获得字符串 "a"、"bc" 和 "bd"。 我尝试使用正则表达式\{([^{}])\}(它获取最后一层,在我的情况下是{c|d})进行递归来获取它,但是在下一步中我有两个字符串:{a|bc}{a|bd},这将产生 "a", "bc", "a", "bd"。也许我需要创建一些图形或树结构? 我还想问一下(?[^{}|$]*),为什么要有"$"? 我删除了它,但没有效果。

"{a|b{c|d}" 应该是 "{a|b{c|d}}" 吧? - dfsq
抱歉 :) 是的,它必须具有正确的语法。我会编辑帖子。 - Guy Fawkes
$string = str_replace('|', ',' $string); var_dump(/bin/bash -c 'echo $string'); :-D - cmbuckley
2个回答

1
假设|{}是保留字符(不允许作为您的变量内容),以下是解决该问题的正则表达式方法。请注意,编写一个简单的状态机解析器会是更好的选择。
<?php // Using PHP5.3 syntax

// PCRE Recursive Pattern
// http://php.net/manual/en/regexp.reference.recursive.php

$string = "This test can be {very {cool|bad} in random order|or be just text} ddd {a|b{c|d}} bar {a|b{c{d|e|f}}} lala {b|c} baz";

if (preg_match_all('#\{((?>[^{}]+)|(?R))+\}#', $string, $matches, PREG_SET_ORDER)) {
    foreach ($matches as $match) {
        // $match[0] == "{a|b{c|d}}" | "{a|b{c{d|e|f}}}" | "{b|c}"
        // have some fun splitting them up
        // I'd suggest walking the characters and building a tree
        // a simpler (slower, uglyer) approach:

        // remove {}
        $set = substr($match[0], 1, -1);
        while (strpos($set, '{') !== false) {
            // explode and replace nested {}
            // reserved characters: "{" and "}" and "|"
            // (?<=^|\{|\|) -- a substring needs to begin with "|" or "{" or be the start of the string,
            //  "?<=" is a positive look behind assertion - the content is not captured
            // (?<prefix>[^{|]+) -- is the prefix, preceeding literal string (anything but reserved characters)
            // \{(?<inner>[^{}]+)\} -- is the content of a nested {} group, excluding the "{" and "}"
            // (?<postfix>[^|}$]*) -- is the postfix, trailing literal string (anything but reserved characters)
            // readable: <begin-delimiter><possible-prefix>{<nested-group>}<possible-postfix>
            $set = preg_replace_callback('#(?<=^|\{|\|)(?<prefix>[^{}|]*)\{(?<inner>[^{}]+)\}(?<postfix>[^{}|$]*)#', function($m) {
                $inner = explode('|', $m['inner']);
                return $m['prefix'] . join($inner, $m['postfix'] . '|' . $m['prefix']) . $m['postfix'];
            }, $set);
        }

        // $items = explode('|', $set);
        echo "$match[0] expands to {{$set}}\n";
    }
}

/*
    OUTPUT:
    {very {cool|bad} in random order|or be just text} expands to {very cool in random order|very bad in random order|or be just text}
    {a|b{c|d}} expands to {a|bc|bd}
    {a|b{c{d|e|f}}} expands to {a|bcd|bce|bcf}
    {b|c} expands to {b|c}
*/

看起来很酷,但对于字符串 $string = "This test can be {very {cool|bad} in random order|or be just text}",它返回了不正确的结果:{very {cool|bad} in random order|or be just text} 扩展为 {very cool|very bad in random order|or be just text}。 - Guy Fawkes
也许我可以修复这个错误... 你能描述一下正则表达式 #(?<=^|{||)(?<prefix>[^{|]+){(?<inner>[^{}]+)}# 的工作原理吗? - Guy Fawkes
你的例子没有表达出尾随字符的可能性。我已经修改了正则表达式以考虑后缀 - 现在它应该按照你的期望工作了。 - rodneyrehm
非常感谢!但是您能解释一下为什么在后缀“排除符号”组中使用“$”吗? - Guy Fawkes
这里的$没有意义。我只是喜欢在集合中加入^和$(如果适用)来提醒自己它们的存在。你可以忽略它 :) - rodneyrehm
哦,谢谢!请写一下你对新手的最佳实践! :) - Guy Fawkes

0

请检查这段代码:

$str = "This test can be {very {cool|bad} in random order|or be just text}";

function parseVarians($str, $buffer = array()) {
    if (empty($buffer)) $buffer['tokens'] = array();
    $newStr = preg_replace_callback('|\{([^{}]+)\}|', function($m) use(&$buffer) {
        $buffer['tokens'][] = explode('|', $m[1]);
        $index = count($buffer['tokens']) - 1;
        return '__' . $index;
    }, $str);

    if ($str != $newStr && strpos($newStr, '{') !== false) {
        return parseVarians($newStr, $buffer);
    }
    else {
        $buffer['str'] = $newStr;
        return $buffer;
    }
}

function devergeVariants($data) {
    krsort($data['tokens']);
    $strings  = array($data['str']);

    foreach ($data['tokens'] as $key => $token) {
        $variants = array();
        foreach ($token as $tok) {
            foreach ($strings as $str) {
                $variants[] = str_replace('__' . $key, $tok, $str);
            }
        }
        $strings = $variants;
    }

    return array_unique($strings);
}

echo '<pre>'; print_r($str); echo '</pre>';

$tokens = parseVarians($str);
//echo '<pre>'; print_r($tokens); echo '</pre>';
$result = devergeVariants($tokens);

echo '<pre>'; print_r( $result ); echo '</pre>';

输出:

This test can be {very {cool|bad} in random order|or be just text}
Array
(
    [0] => This test can be very cool in random order
    [1] => This test can be or be just text
    [2] => This test can be very bad in random order
)

看起来这是你想要的吗?


看起来不错,但是使用 array_unique:实际上在 $strings 数组中有一个重复的 "This test can be or be just text"。 - Guy Fawkes
重复...是的。这是我算法的一个缺点。 - dfsq

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接