如何在PHP中使用preg_split()？

Question

如何在PHP中使用preg_split()？

10

有人能给我解释一下如何使用preg_split()函数吗？我不太明白像"/[\s,]+/"这样的模式参数。

例如：

我有这个字符串：is is. 我希望结果为：

array (
  0 => 'is',
  1 => 'is',
)

那么它将忽略空格和句号，我该怎么做呢？

- MD.MD

2

你正在使用哪些确切的规则？你是想从一个字符串中获取单词列表吗？ - ceejayoz

我不知道你的输入可能是什么，但在空格上分割并修剪可能会更容易。 - jeroen

@jeroen 无论输入是什么，数组都应该只存储单词。 - MD.MD

1

preg_split() 适合于将字符串切割成块，并且您确切知道如何操作。如果您知道要从字符串中获取什么内容， preg_match() 可能是一个不错的选择。在这种情况下，您需要提取单词，因此 preg_match() 可能更为适合。您应该考虑使用它。 - Sverri M. Olsen

1

str_word_count比preg_split更好。它可以返回一个单词数组。 - ceejayoz

显示剩余2条评论

4个回答

8

这应该可以工作：

$words = preg_split("/(?<=\w)\b\s*[!?.]*/", 'is is.', -1, PREG_SPLIT_NO_EMPTY);

echo '<pre>';
print_r($words);
echo '</pre>';

输出将是：

Array
(
    [0] => is
    [1] => is
)

在我解释正则表达式之前，先解释一下 PREG_SPLIT_NO_EMPTY。这基本上意味着，只有在结果不为空时才返回 preg_split 的结果。这可以确保数组 $words 中返回的数据确实具有数据，而不仅仅是空值，这种情况可能会在处理正则表达式模式和混合数据源时发生。

而该正则表达式的解释可以使用 this tool 来分解如下：

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?<=                     look behind to see if there is:
--------------------------------------------------------------------------------
    \w                       word characters (a-z, A-Z, 0-9, _)
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  [!?.]*                   any character of: '!', '?', '.' (0 or more
                           times (matching the most amount possible))

在这个其他工具中输入完整的正则表达式模式/(?<=\w)\b\s*[!?.]*/可以找到更好的解释：

(?<=\w) 正向后查找 - 断言接下来的正则表达式可以匹配
\w 匹配任何单词字符 [a-zA-Z0-9_]
\b 断言当前位置是单词边界 (^\w|\w$|\W\w|\w\W)
\s* 匹配任何空白字符 [\r\n\t\f ]
量词：零次或多次，尽可能多地匹配 [贪婪模式]
!?. 列表中的一个字符 !?. 字面意义上

最后一个正则表达式的解释可以被人类——也就是我——简化为以下内容：

匹配并分割任何单词字符，该单词字符位于可以有多个空格和标点符号!?.的单词边界之前。

- Giacomo1968

1

文档说：

preg_split() 函数的操作方式与 split() 完全相同，只是接受正则表达式作为模式的输入参数。

所以，以下代码...

<?php

$ip = "123 ,456 ,789 ,000"; 
$iparr = preg_split ("/[\s,]+/", $ip); 
print "$iparr[0] <br />";
print "$iparr[1] <br />" ;
print "$iparr[2] <br />"  ;
print "$iparr[3] <br />"  ;

?>

这将产生以下结果。

因此，如果您有这样的主题：is is，并且您希望：

array (
  0 => 'is',
  1 => 'is',
)

您需要修改您的正则表达式为"/[\s]+/"

除非您有is ,is，否则您需要使用已经拥有的正则表达式"/[\s,]+/"

- Federico Piazza

@FedericoPiazza 在 preg_split 中，\s 表示空格，那么 + 又是什么意思呢？ - Gem

1

@Gem \s表示任何空白字符（包括制表符），+表示_1个或多个_。如果将+更改为*，则表示_0个或多个_。您可以使用regex101.com查看正则表达式的详细说明。 - Federico Piazza

1

如果涉及到编程，我建议您使用 str_word_count 函数来处理。该函数可以将字符串中的所有单词（包括重复出现的单词）存储在一个数组中。

例如： str_word_count($string, 2) 将输出包含字符串中所有单词的数组。

- ceejayoz

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Majenko · Accepted Answer

preg 的意思是 Pcre REGexp（Perl兼容正则表达式），这有点多余，因为“PCRE”已经包含了“Perl Compatible Regexp”的意思。

正则表达式对于初学者来说是一场噩梦。我即使已经使用它们多年，也仍不完全理解它们。

基本上，你所提供的示例可以分解为：

"/[\s,]+/"

/ = start or end of pattern string
[ ... ] = grouping of characters
+ = one or more of the preceeding character or group
\s = Any whitespace character (space, tab).
, = the literal comma character

所以您拥有一个搜索模式，它会在字符串的任何至少一个空格字符和/或一个或多个逗号的任何部分上进行拆分。

其他常见字符包括：

. = any single character
* = any number of the preceeding character or group
^ (at start of pattern) = The start of the string
$ (at end of pattern) = The end of the string
^ (inside [...]) = "NOT" the following character

PHP官方文档中有关于正则表达式语法的良好信息，请参阅。