大家好,我有一个小问题。我需要提取文本中的“anyone”单词。
我尝试使用strtok()、strstr()以及一些正则表达式来检索单词,但只能提取一些单词。
由于单词可能伴随着许多字符和符号,所以这个问题变得很复杂。
下面是需要提取单词的示例文本:
Main article: our 46,000 required, !but (1947-2011) mail@server.com March 8, 2014 Gutenberg's 34-DE 'a' 3,1415 Us: @unknown n go http://google.com or www.google.com and http://www.google.com (r) The 509th "composite" and; C-54 #dog v4.0 ¿as is done? ¿article... agriculture? x ¿cat? now! Hi!! (87 meters).
Sample text, for testing.
提取文本的结果应该是:
Main article our required but March Gutenberg's a go or and The composite and dog as is done article agriculture cat now Hi meters
Sample text for testing
我写的第一个函数是为了方便工作。
function PreText($text){
$text = str_replace("\n", ".", $text);
$text = str_replace("\r", ".", $text);
$text = str_replace("'", "", $text);
$text = str_replace("?", "", $text);
$text = str_replace("¿", "", $text);
$text = str_replace("(", "", $text);
$text = str_replace(")", "", $text);
$text = str_replace('"', "", $text);
$text = str_replace(';', "", $text);
$text = str_replace('!', "", $text);
$text = str_replace('<', "", $text);
$text = str_replace('>', "", $text);
$text = str_replace('#', "", $text);
$text = str_replace(",", "", $text);
$text = str_replace(".c", "", $text);
$text = str_replace(".C", "", $text);
return $text;
}
分割函数:
function SplitWords($text){
$words = explode(" ", $text);
$ContWords = count($words);
for ($i = 0; $i < $ContWords; $i++){
if (ctype_alpha($words[$i])) {
$NewText .= $words[$i].", ";
}
}
return $NewText;
}
该程序:
<?
include_once ('functions.php');
$text = "Main article: our 46,000 ...";
$text = PreText($text);
$text = SplitWords($text);
echo $text;
?>
代码还有很长的路要走。我们感谢您的帮助。
Gutenberg's
。 - Sonic'a'
被转换为a
,所以有点棘手。 - Sonicmail@server.com
。 - Toto