如何检查一个字符串是否包含特定的单词？

Question

如何检查一个字符串是否包含特定的单词？

phpstringsubstringcontainsstring-matching

2657

考虑：

$a = 'How are you?';

if ($a contains 'are')
    echo 'true';

假设我有上述代码，正确的写法是什么样的语句if ($a contains 'are')？

- Charles Yeung

36个回答

752

你可以使用正则表达式进行单词匹配，这比其他用户提到的strpos更好。对于are的strpos检查也会返回类似fare、care、stare等字符串的true结果。使用单词边界即可在正则表达式中避免这些意外匹配。

一个简单的匹配are的正则表达式如下：

$a = 'How are you?';

if (preg_match('/\bare\b/', $a)) {
    echo 'true';
}

就性能而言，strpos 大约快三倍。我一次进行了一百万次比较，preg_match 花费了 1.5 秒才完成，而 strpos 则只用了 0.5 秒。

编辑：为了搜索字符串中的任意部分而不是逐个单词搜索，我建议使用正则表达式，例如

$a = 'How are you?';
$search = 'are y';
if(preg_match("/{$search}/i", $a)) {
    echo 'true';
}

正则表达式末尾的 i 可以将正则表达式设置为不区分大小写，如果你不想这样做，可以省略它。

然而，在某些情况下，这可能会造成问题，因为 $search 字符串没有经过任何方式的净化处理。也就是说，它在某些情况下可能无法通过检查，例如当 $search 是用户输入时，他们可以添加一些字符串，这些字符串可能会像某个不同的正则表达式一样运行。

此外，这里有一个很棒的工具，可以测试和查看各种正则表达式的解释：Regex101

要将两组功能合并为一个多用途函数（包括可选择的大小写敏感性），你可以使用类似以下代码：

function FindString($needle,$haystack,$i,$word)
{   // $i should be "" or "i" for case insensitive
    if (strtoupper($word)=="W")
    {   // if $word is "W" then word search instead of string in string search.
        if (preg_match("/\b{$needle}\b/{$i}", $haystack)) 
        {
            return true;
        }
    }
    else
    {
        if(preg_match("/{$needle}/{$i}", $haystack)) 
        {
            return true;
        }
    }
    return false;
    // Put quotes around true and false above to return them as strings instead of as bools/ints.
}

需要注意的一点是，\b 只在英语之外的语言中无法使用。

这里讲解了原因以及解决方案:

\b 表示单词（词边界）的开始或结束。此正则表达式将匹配“apple pie”中的“apple”，但不会匹配“pineapple”，“applecarts”或“bakeapples”中的“apple”。

那么“café”呢？我们如何在正则表达式中提取单词“café”？\bcafé\b实际上不起作用。为什么？因为“café”包含非ASCII字符：é。 \b不能简单地与Unicode一起使用，例如समुद्र，감사，месяц和。

当您想提取 Unicode 字符时，应直接定义表示单词边界的字符。

答案是：(?<=[\s,.:;"']|^)UNICODE_WORD(?=[\s,.:;"']|$)

因此，要在PHP中使用答案，可以使用此函数：

function contains($str, array $arr) {
    // Works in Hebrew and any other unicode characters
    // Thanks https://medium.com/@shiba1014/regex-word-boundaries-with-unicode-207794f6e7ed
    // Thanks https://www.phpliveregex.com/
    if (preg_match('/(?<=[\s,.:;"\']|^)' . $word . '(?=[\s,.:;"\']|$)/', $str)) return true;
}

如果您想搜索一个单词数组，您可以使用以下代码：

function arrayContainsWord($str, array $arr)
{
    foreach ($arr as $word) {
        // Works in Hebrew and any other unicode characters
        // Thanks https://medium.com/@shiba1014/regex-word-boundaries-with-unicode-207794f6e7ed
        // Thanks https://www.phpliveregex.com/
        if (preg_match('/(?<=[\s,.:;"\']|^)' . $word . '(?=[\s,.:;"\']|$)/', $str)) return true;
    }
    return false;
}

从PHP 8.0.0开始，你现在可以使用str_contains

<?php
    if (str_contains('abc', '')) {
        echo "Checking the existence of the empty string will always"
        return true;
    }

- Breezer

11

@Alexander.Plutov，首先你给了我一个负一而不是问题？拜托，花两秒钟谷歌答案：http://www.google.com/search?btnG=1&pws=0&q=find+word+in+string+php - Breezer

65

这是一种可怕的搜索简单字符串的方式，但许多访问Stack Overflow的访客正在寻找任何寻找自己任意子字符串的方法，这个建议很有帮助。即使是提问者可能也过于简化了 - 让他知道他的替代方案。 - SamGoody

77

从技术上讲，这个问题询问如何找到单词而不是子字符串。这对我有帮助，因为我可以使用带有正则表达式单词边界的方法来解决。备选方案总是很有用的。 - user764357

16

给答案加1分，给@plutov.by的评论减1分，因为strpos只能进行单个检查，而正则表达式可以同时检查多个单词，例如preg_match(/are|you|not/)。 - albanx

7

正则表达式应该是最后的手段。在琐碎的任务中使用它们应该被 découragé。我坚持这一点，来自多年挖掘糟糕代码的高度。 - yentsun

显示剩余12条评论

300

这是一个小工具函数，在像这样的情况下非常有用

// returns true if $needle is a substring of $haystack
function contains($needle, $haystack)
{
    return strpos($haystack, $needle) !== false;
}

- ejunker

75

实际上，它可以提高代码的可读性。此外，应该将踩（downvotes）保留给非常糟糕的答案，而不是“中立”的答案。 - Xaqq

40

函数几乎是为了提高可读性而存在的（以便传达你正在做的事情的想法）。比较一下哪个更易读：if ($email->contains("@") && $email->endsWith(".com)) { ...或if (strpos($email, "@") !== false && substr($email, -strlen(".com")) == ".com") { ... - Brandin

3

最终规则是为了打破的。否则人们就不会想出新的创新方式来做事 :) 。此外，我必须承认我很难理解像martinfowler.com这样的内容。我猜该做的正确事情是自己尝试一些东西，并找出最方便的方法。 - James P.

6

另一个观点是：拥有可以轻松封装的效用函数可以帮助调试。此外，这也强调了对于在生产服务中消除此类开销的良好优化器的需求。因此，所有观点都有其合理之处。 ;) - Tino

20

当然这很有用。你应该鼓励使用它。如果 PHP 100 中出现了一种新的更快速地查找字符串位置的方法，你想要改变所有调用 strpos 的地方吗？还是只更改函数内部的包含部分？ - Cosmin

显示剩余8条评论

177

您可以使用 PHP 函数 strpos() 来确定一个字符串是否包含另一个字符串。

int strpos ( string $haystack , mixed $needle [, int $offset = 0 ] )`

<?php

$haystack = 'how are you';
$needle = 'are';

if (strpos($haystack,$needle) !== false) {
    echo "$haystack contains $needle";
}

?>

注意:

如果你要搜索的字符串在目标字符串开头，strpos()会返回位置0。如果你使用==比较符号进行比较，这是行不通的，你需要使用===。

== 是一个比较运算符，用于判断左侧的变量、表达式或常量是否与右侧的变量、表达式或常量具有相同的值。

=== 是一个比较运算符，用于判断两个变量、表达式或常量是否相等 且 类型相同 - 也就是说，两者都是字符串或整数类型。

使用这种方法的一个优点是，每个PHP版本都支持此函数，不像str_contains()一样存在版本兼容性问题。

- Jose Vega

如果我使用“care”，它也会返回true :( - Jahirul Islam Mamun

162

虽然大多数答案会告诉你如何判断一个字符串中是否包含子串，但如果你要查找特定的单词而不是子串，那通常并不是你想要的。

有什么区别呢？子串可以出现在其他单词中：

"area" 中开头的 "are"
"hare" 中结尾的 "are"
"fares" 中间的 "are"

缓解这种情况的一种方法是使用正则表达式和单词边界（\b）：

function containsWord($str, $word)
{
    return !!preg_match('#\\b' . preg_quote($word, '#') . '\\b#i', $str);
}

这种方法没有上述相同的误报问题，但它有一些自己的边缘情况。单词边界是在非单词字符（\W）上匹配的，这些字符将是任何不是a-z，A-Z，0-9或_的字符。这意味着数字和下划线将被视为单词字符，而像这样的情况将失败：

"What _are_ you thinking?"中的"are"
"lol u dunno wut those are4?"中的"are"

如果您想要比这更准确的内容，那么您就必须开始进行英语语法分析，而这是一个相当棘手的问题（并且无论如何都假设正确使用语法，这并不总是确定的）。

- FtDRbwLXw6

25

这应该是标准答案。因为我们正在寻找的是“单词”而不是“子字符串”，所以正则表达式是合适的。我还要补充一点，\b 可以匹配两个 \W 无法匹配的东西，这使它非常适合在字符串中查找“单词”：它可以匹配字符串的开头 (^) 和结尾 ($)。 - code_monk

1

这应该是正确的答案。其他答案会在字符串中找到“do you care”中的“are”。如@Dtest所述。 - Robert Sinclair

@RobertSinclair 那有什么不好的呢？如果你问我字符串“do you care”是否包含单词“are”，我会回答“是的”。单词“are”显然是该字符串的子字符串。这是一个与“'are'是否是字符串'do you care'中的单词之一”无关的问题。 - Paul

@Paulpro 尽管 OP 没有明确指定 $a 是一个短语，但我相信这是暗示了的。所以他的问题是如何检测短语中的单词。而不是一个单词是否包含另一个单词，我认为这通常是无关紧要的。 - Robert Sinclair

@Jimbo，它确实可以工作，你只是漏了 \\。https://3v4l.org/ZRpYi - MetalWeirdo

79

查看strpos()函数：

<?php
$mystring = 'abc';
$findme   = 'a';
$pos = strpos($mystring, $findme);

// Note our use of ===. Simply, == would not work as expected
// because the position of 'a' was the 0th (first) character.
if ($pos === false) {
    echo "The string '$findme' was not found in the string '$mystring'.";
} else {
    echo "The string '$findme' was found in the string '$mystring',";
    echo " and exists at position $pos.";
}

- Haim Evgi

66

如果您的搜索应该不区分大小写，可以使用 strstr() 或 stristr()。

- glutorange

9

注：如果你只是想确定一个特定的搜索词是否在字符串中出现，建议使用更快速和占用更少内存的函数strpos()，而不是strstr()。 - Jo Smo

@tastro 这方面有没有可靠的基准测试？ - Wayne Whitty

这可能会慢一些，但在我看来，strstr($a, 'are')比丑陋的strpos($a, 'are') !== false更优雅。PHP真的需要一个str_contains()函数。 - Paul Spiegel

这让我感到惊讶，这不是被接受的答案。 - kurdtpage

57

与SamGoody和Lego Stormtroopr的评论同行。

如果您正在寻找一种基于PHP算法来根据多个单词的接近度/相关性对搜索结果进行排名的方法，这里有一个快速简便的PHP生成搜索结果的方法：

其他布尔搜索方法（如strpos()、preg_match()、strstr()或stristr()）的问题

无法搜索多个单词
结果未排序

基于向量空间模型和tf-idf（词频-逆文档频率）的PHP方法：

听起来很难，但实际上非常容易。

如果我们想在字符串中搜索多个单词，核心问题是如何为它们中的每一个分配权重？

如果我们可以根据单词对整个字符串的代表性加权，我们就可以按最符合查询的结果排序。

这是向量空间模型的概念，与SQL全文搜索工作方式相差无几。

function get_corpus_index($corpus = array(), $separator=' ') {

    $dictionary = array();

    $doc_count = array();

    foreach($corpus as $doc_id => $doc) {

        $terms = explode($separator, $doc);

        $doc_count[$doc_id] = count($terms);

        // tf–idf, short for term frequency–inverse document frequency, 
        // according to wikipedia is a numerical statistic that is intended to reflect 
        // how important a word is to a document in a corpus

        foreach($terms as $term) {

            if(!isset($dictionary[$term])) {

                $dictionary[$term] = array('document_frequency' => 0, 'postings' => array());
            }
            if(!isset($dictionary[$term]['postings'][$doc_id])) {

                $dictionary[$term]['document_frequency']++;

                $dictionary[$term]['postings'][$doc_id] = array('term_frequency' => 0);
            }

            $dictionary[$term]['postings'][$doc_id]['term_frequency']++;
        }

        //from http://phpir.com/simple-search-the-vector-space-model/

    }

    return array('doc_count' => $doc_count, 'dictionary' => $dictionary);
}

function get_similar_documents($query='', $corpus=array(), $separator=' '){

    $similar_documents=array();

    if($query!=''&&!empty($corpus)){

        $words=explode($separator,$query);

        $corpus=get_corpus_index($corpus, $separator);

        $doc_count=count($corpus['doc_count']);

        foreach($words as $word) {

            if(isset($corpus['dictionary'][$word])){

                $entry = $corpus['dictionary'][$word];


                foreach($entry['postings'] as $doc_id => $posting) {

                    //get term frequency–inverse document frequency
                    $score=$posting['term_frequency'] * log($doc_count + 1 / $entry['document_frequency'] + 1, 2);

                    if(isset($similar_documents[$doc_id])){

                        $similar_documents[$doc_id]+=$score;

                    }
                    else{

                        $similar_documents[$doc_id]=$score;

                    }
                }
            }
        }

        // length normalise
        foreach($similar_documents as $doc_id => $score) {

            $similar_documents[$doc_id] = $score/$corpus['doc_count'][$doc_id];

        }

        // sort from  high to low

        arsort($similar_documents);

    }   

    return $similar_documents;
}

案例1

$query = 'are';

$corpus = array(
    1 => 'How are you?',
);

$match_results=get_similar_documents($query,$corpus);
echo '<pre>';
    print_r($match_results);
echo '</pre>';

结果

Array
(
    [1] => 0.52832083357372
)

案例2

$query = 'are';

$corpus = array(
    1 => 'how are you today?',
    2 => 'how do you do',
    3 => 'here you are! how are you? Are we done yet?'
);

$match_results=get_similar_documents($query,$corpus);
echo '<pre>';
    print_r($match_results);
echo '</pre>';

结果

Array
(
    [1] => 0.54248125036058
    [3] => 0.21699250014423
)

案例3

$query = 'we are done';

$corpus = array(
    1 => 'how are you today?',
    2 => 'how do you do',
    3 => 'here you are! how are you? Are we done yet?'
);

$match_results=get_similar_documents($query,$corpus);
echo '<pre>';
    print_r($match_results);
echo '</pre>';

结果

Array
(
    [3] => 0.6813781191217
    [1] => 0.54248125036058
)

有许多改进可以进行，但该模型提供了一种从自然查询中获得良好结果的方法，这些查询没有布尔运算符，例如strpos()、preg_match()、strstr()或stristr()。注意可选地，在搜索单词之前消除冗余

从而减少索引大小，需要更少的存储空间
降低磁盘I/O
更快的索引和搜索速度。

1. 标准化

将所有文本转换为小写

2. 停用词消除

从文本中消除不带实际含义的词语（如“and”、“or”、“the”、“for”等）

3. 字典替换

用和原词相同或相似的词语进行替换。(例如：将"hungrily"和"hungry"替换为"hunger")
可以执行进一步的算法措施（如snowball）以进一步将单词减少到其基本含义。
使用十六进制等效的颜色名称进行替换。
通过降低精度来减少数字值是归一化文本的其他方法。

资源

以下是您需要翻译的内容：

- RafaSashi

54

利用strpos()进行子字符串匹配：

if (strpos($string,$stringToSearch) !== false) {
    echo 'true';
}

- Shankar Narayana Damodaran

43

如果您想避免“falsey”和“truthy”问题，可以使用substr_count：

if (substr_count($a, 'are') > 0) {
    echo "at least one 'are' is present!";
}

它比strpos慢一些，但避免了比较问题。

- Alan Piralla

由于 strpos 的位置为 0，所以它对于“你确定吗？”返回 false。 - Hafenkranich

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- codaddict · Accepted Answer

现在使用 PHP 8，你可以使用 str_contains 来实现这个功能。

if (str_contains('How are you', 'are')) { 
    echo 'true';
}

请注意：如果$needle（要在字符串中搜索的子字符串）为空，str_contains函数将始终返回true。

$haystack = 'Hello';
$needle   = '';

if (str_contains($haystack, $needle)) {
    echo "This returned true!";
}

你应该首先确保$needle（你的子字符串）不为空。

$haystack = 'How are you?';
$needle   = '';

if ($needle !== '' && str_contains($haystack, $needle)) {
    echo "This returned true!";
} else {
    echo "This returned false!";
}

输出: 这返回了false!

值得注意的是，新的str_contains函数是区分大小写的。

$haystack = 'How are you?';
$needle   = 'how';

if ($needle !== '' && str_contains($haystack, $needle)) {
    echo "This returned true!";
} else {
    echo "This returned false!";
}

输出: 这个返回了false！

RFC

PHP 8之前

你可以使用strpos()函数来查找一个字符串在另一个字符串中的出现位置：

$haystack = 'How are you?';
$needle   = 'are';

if (strpos($haystack, $needle) !== false) {
    echo 'true';
}

请注意，使用!== false是有意的（既不是!= false也不是=== true会返回所需的结果）；strpos()要么返回针在堆栈中开始的偏移量，要么返回布尔值false如果未找到针。由于0是有效的偏移量且0是“假值”，我们不能使用更简单的结构，如!strpos($a, 'are')。