匹配字符集和可选实体

Question

匹配字符集和可选实体

3

所以我想使用这个代码在每5个字符中插入一个单词分隔符。

([^\s-]{5})([^\s-]{5})

不幸的是，它也会在实体字符（&#xxx;）上中断。有人能提供一个不会破坏实体代码的示例吗？我想要打破的字符串来自xml，因此实际实体进一步转义（&#xxx;）。

编辑代码示例

preg_replace('/([^\s-]{5})([^\s-]{5})/', '$1&shy;$2', $subject)

Given the word "F&amp;#xe5;revejle"
Expect "F&amp;#xe5;&shy;revejle" as result
But it outputs "F&shy;5;revejle" instead

- ken

那不是代码，只是一个正则表达式本身并没有做太多事情。你能展示一下你正在使用这个正则表达式的实际代码以及一个样本字符串，在此之前和之后的样子吗？特别是你想要它变成什么样子，而不是现在得到的结果。 - Tim Pietzcker

那么命名实体&、&quot、<和>呢？十六进制实体/&#x[A-Fa-f0-9]+;/呢？如果数字实体代表连字符或空格字符怎么办？ - Alan Moore

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Tim Pietzcker · Accepted Answer

假设您想在每个单词的五个字符后拆分，除非它们已经被连字符分隔，将实体视为单个字符，请尝试以下方法：

$result = preg_replace(
    '/            # Start the match 
    (?:           # at one of the following positions:
     (?<=         # Either right after...
      [\s-]       # a space or dash
     )            # end of lookbehind
     |            # or...
     \G           # wherever the last match ended.
    )             # End of start condition.
    (             # Now match and capture the following:
     (?>          # Match the following in an atomic group:
      &amp;\#\w+; # an entity
      |           # or
      [^\s-]      # a non-space, non-dash character
     ){5}         # exactly 5 times.
    )             # End of capture
    (?=[^\s-])    # Assert that we\'re not at the end of a "word"/x', 
    '\1&shy;', $subject);

这会带来变化。

supercalifragilisticexpidon'tremember! 
alrea-dy se-parated 
count entity as one character&amp;#345;blahblah
F&amp;#xe5;revejle

转换为

super&shy;calif&shy;ragil&shy;istic&shy;expid&shy;on'tr&shy;ememb&shy;er! 
alrea-dy se-parat&shy;ed 
count entit&shy;y as one chara&shy;cter&amp;#345;&shy;blahb&shy;lah
F&amp;#xe5;rev&shy;ejle