PHP正则表达式帮助:解析字符串

3

我有一个字符串,例如以下内容:

Are you looking for a quality real estate company? 

<s>Josh's real estate firm specializes in helping people find homes from          
[city][State].</s>

<s>Josh's real estate company is a boutique real estate firm serving clients 
locally.</s> 

In [city][state] I am sure you know how difficult it is
to find a great home, but we work closely with you to give you exactly 
what you need

我希望将这一段根据 <s> </s> 标签拆分成一个数组,以便得到以下数组作为结果:

[0] Are you looking for a quality real estate company?
[1] Josh's real estate firm 
    specializes in helping people find homes from [city][State].
[2] Josh's real estate company is a boutique real estate firm serving clients 
    locally.
[3] In [city][state] I am sure you know how difficult it is
    to find a great home, but we work closely with you to give you exactly 
    what you need

这是我目前正在使用的正则表达式:

$matches = array();
preg_match_all(":<s>(.*?)</s>:is", $string, $matches);
$result = $matches[1];
print_r($result);

但是这个函数只会返回在<s> </s>标签之间的文本数组,它会忽略这些标签之前和之后的文本。(在上面的例子中,它只会返回数组元素1和2。)

有什么想法吗?

2个回答

2
我能找到的最接近的方法是使用preg_split()代替:

$string = <<< STR
Are you looking for a quality real estate company? <s>Josh's real estate firm 
specializes in helping people find homes from [city][State].</s>
<s>Josh's real estate company is a boutique real estate firm serving clients 
locally.</s> In [city][state] I am sure you know how difficult it is
to find a great home, but we work closely with you to give you exactly 
what you need
STR;

print_r(preg_split(':</?s>:is', $string));

然后得到这个输出:

Array
(
    [0] => Are you looking for a quality real estate company? 
    [1] => Josh's real estate firm 
specializes in helping people find homes from [city][State].
    [2] => 

    [3] => Josh's real estate company is a boutique real estate firm serving clients 
locally.
    [4] =>  In [city][state] I am sure you know how difficult it is
to find a great home, but we work closely with you to give you exactly 
what you need
)

除此之外,它生成了一个额外的数组元素(索引2),在片段[city][State].</s><s>Josh's real estate company之间有一个换行符。

然而,添加一些代码以删除空格匹配是微不足道的,但我不确定您是否需要这样做。


额外的数组元素没问题,但它似乎只在寻找 </s>,这意味着像 my name is bob. im 17 </s>.my name is bob. <s>im 17</s> 这样的句子都会被分成两个元素,能否更改使得第一个例子仅保留一个数组元素?(我希望未匹配的未关闭的 </s> 不会被匹配)。 - Ali
如果可以删除空数组元素,那我更喜欢这样。 - Ali
我会稍微调整一下我的代码,如果我能够匹配只有正确开闭标签并删除空元素,那么我会更新我的答案。 - BoltClock

1

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接