我应该如何分割单词:
oneTwoThreeFour
将其转换为数组,以便我可以获取:
one Two Three Four
如何使用preg_match
?
我尝试了这个方法,但是它只返回整个单词。
$words = preg_match("/[a-zA-Z]*(?:[a-z][a-zA-Z]*[A-Z]|[A-Z][a-zA-Z]*[a-z])[a-zA-Z]*\b/", $string, $matches)`;
我应该如何分割单词:
oneTwoThreeFour
将其转换为数组,以便我可以获取:
one Two Three Four
如何使用preg_match
?
我尝试了这个方法,但是它只返回整个单词。
$words = preg_match("/[a-zA-Z]*(?:[a-z][a-zA-Z]*[A-Z]|[A-Z][a-zA-Z]*[a-z])[a-zA-Z]*\b/", $string, $matches)`;
你可以使用 preg_split
:
$arr = preg_split('/(?=[A-Z])/',$str);
我基本上是在大写字母前分割输入字符串。使用的正则表达式(?=[A-Z])
匹配大写字母前的点。
你也可以使用 preg_match_all
,如下所示:
preg_match_all('/((?:^|[A-Z])[a-z]+)/',$str,$matches);
解释:
( - Start of capturing parenthesis.
(?: - Start of non-capturing parenthesis.
^ - Start anchor.
| - Alternation.
[A-Z] - Any one capital letter.
) - End of non-capturing parenthesis.
[a-z]+ - one ore more lowercase letter.
) - End of capturing parenthesis.
HTMLParser
这样的字符串:https://dev59.com/11jUa4cB1Zd3GeqPVuQL#6572999。 - Maciej Sz'/(?:^|[A-Z])[a-z]+/'
就足以产生一个数组(而不是两个)。这是因为preg_match_all()
自动捕获所有匹配实例,无需您明确指定。 - cartbeforehorse我知道这是一个旧问题,并且已经有了一个被接受的答案,但在我看来,有一个更好的解决方案:
<?php // test.php Rev:20140412_0800
$ccWord = 'NewNASAModule';
$re = '/(?#! splitCamelCase Rev:20140412)
# Split camelCase "words". Two global alternatives. Either g1of2:
(?<=[a-z]) # Position is after a lowercase,
(?=[A-Z]) # and before an uppercase letter.
| (?<=[A-Z]) # Or g2of2; Position is after uppercase,
(?=[A-Z][a-z]) # and before upper-then-lower case.
/x';
$a = preg_split($re, $ccWord);
$count = count($a);
for ($i = 0; $i < $count; ++$i) {
printf("Word %d of %d = \"%s\"\n",
$i + 1, $count, $a[$i]);
}
?>
注意,这个正则表达式(就像codaddict的解决方案'/(?=[A-Z])/'
一样-对于格式良好的camelCase单词非常有效),只匹配字符串内的一个位置,并不消耗任何文本。这种解决方案还有额外的好处,它也可以正确地处理那些不太规范的伪camelcase单词,例如:StartsWithCap
和:hasConsecutiveCAPS
。
oneTwoThreeFour
StartsWithCap
hasConsecutiveCAPS
NewNASAModule
第1个单词 = "one"
第2个单词 = "Two"
第3个单词 = "Three"
第4个单词 = "Four"
第1个单词 = "Starts"
第2个单词 = "With"
第3个单词 = "Cap"
第1个单词 = "has"
第2个单词 = "Consecutive"
第3个单词 = "CAPS"
第1个单词 = "New"
第2个单词 = "NASA"
第3个单词 = "Module"
编辑:2014-04-12:修改了正则表达式、脚本和测试数据,以正确拆分"NewNASAModule"
的情况(响应rr的评论)。
NewNASAModule
(输出:[New, NASAModule]
;我期望的是 [New, NASA, Module]
)。 - rr-NewNASAModule
:RegEx to split camelCase or TitleCase (advanced) - ridgerunner虽然ridgerunner的答案很好,但似乎无法处理出现在句子中间的全大写子串。我使用以下代码,似乎可以很好地处理这些问题:
function splitCamelCase($input)
{
return preg_split(
'/(^[^A-Z]+|[A-Z][^A-Z]+)/',
$input,
-1, /* no limit for replacement count */
PREG_SPLIT_NO_EMPTY /*don't return empty elements*/
| PREG_SPLIT_DELIM_CAPTURE /*don't strip anything from output array*/
);
}
assert(splitCamelCase('lowHigh') == ['low', 'High']);
assert(splitCamelCase('WarriorPrincess') == ['Warrior', 'Princess']);
assert(splitCamelCase('SupportSEELE') == ['Support', 'SEELE']);
assert(splitCamelCase('LaunchFLEIAModule') == ['Launch', 'FLEIA', 'Module']);
assert(splitCamelCase('anotherNASATrip') == ['another', 'NASA', 'Trip']);
@ridgerunner的答案的函数版本。
/**
* Converts camelCase string to have spaces between each.
* @param $camelCaseString
* @return string
*/
function fromCamelCase($camelCaseString) {
$re = '/(?<=[a-z])(?=[A-Z])/x';
$a = preg_split($re, $camelCaseString);
return join($a, " " );
}
$string = preg_replace( '/([a-z0-9])([A-Z])/', "$1 $2", $string );
/^[^A-Z]+\K|[A-Z][^A-Z]+\K/
(21步骤)/(^[^A-Z]+|[A-Z][^A-Z]+)/
(26步骤)/[^A-Z]+\K(?=[A-Z])/
(43步骤)/(?=[A-Z])/
(50步骤)/(?=[A-Z]+)/
(50步骤)/([a-z]{1})[A-Z]{1}/
(53步骤)/([a-z0-9])([A-Z])/
(68步骤)/(?<=[a-z])(?=[A-Z])/x
(94步骤) ...顺便说一下,x
是无用的。/(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])/
(134步骤)/[A-Z]?[a-z]+/
(14步骤)/((?:^|[A-Z])[a-z]+)/
(35步骤)oneTwoThreeFour
hasConsecutiveCAPS
newNASAModule
USAIsGreatAgain
preg_split()
模式:
/[a-z]+\K|(?=[A-Z][a-z]+)/
(149 步)*我必须在演示中使用 [a-z]
才能正确计数/(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])/
(547 步)preg_match_all()
模式:
/[A-Z]?[a-z]+|[A-Z]+(?=[A-Z][a-z]|$)/
(75 步) preg_split()
而不是 preg_match_all()
(尽管模式步骤较少),以实现对所需输出结构的直接性。当然,您可以根据自己的喜好进行选择。$noAcronyms = 'oneTwoThreeFour';
var_export(preg_split('~^[^A-Z]+\K|[A-Z][^A-Z]+\K~', $noAcronyms, 0, PREG_SPLIT_NO_EMPTY));
echo "\n---\n";
var_export(preg_match_all('~[A-Z]?[^A-Z]+~', $noAcronyms, $out) ? $out[0] : []);
$withAcronyms = 'newNASAModule';
var_export(preg_split('~[^A-Z]+\K|(?=[A-Z][^A-Z]+)~', $withAcronyms, 0, PREG_SPLIT_NO_EMPTY));
echo "\n---\n";
var_export(preg_match_all('~[A-Z]?[^A-Z]+|[A-Z]+(?=[A-Z][^A-Z]|$)~', $withAcronyms, $out) ? $out[0] : []);
(?=[A-Z][a-z]|$)
改为 (?![a-z])
。 - Casimir et Hippolyteecho deliciousCamelcase('NewNASAModule');
function deliciousCamelcase($str)
{
$formattedStr = '';
$re = '/
(?<=[a-z])
(?=[A-Z])
| (?<=[A-Z])
(?=[A-Z][a-z])
/x';
$a = preg_split($re, $str);
$formattedStr = implode(' ', $a);
return $formattedStr;
}
新的NASA模块
。另一个选项是匹配/[A-Z]?[a-z]+/
- 如果您知道输入的格式正确,它应该可以很好地工作。
[A-Z]?
将匹配大写字母(或无内容)。 [a-z]+
然后将匹配所有以下小写字母,直到下一个匹配。
这个函数将camelCase转换为句子:
ucfirst(strtolower(implode(' ', preg_split('/(?=[A-Z])/', $camelCaseStr))));
"helloWorld" -> "你好,世界"