在字符串中查找子字符串的数量

Question

在字符串中查找子字符串的数量

cstringsearchsubstring

16

我需要使用C语言查找字符串中子串出现的次数。我正在使用strstr函数，但它只能找到第一个出现的子串。

我的算法想法是在字符串中搜索，当strstr返回null时停止，并在每个循环中对主字符串进行子串操作。我的问题是如何实现这个算法？

- Jordan Borisov

6个回答

6

已经处理过的字符串部分是否应被消耗？

例如，在foooo中搜索oo的情况下，期望的答案是2还是3？

If the latter (we allow substring overlapping, and the answer is three), then Joachim Isaksson suggested the right code.

If we search for distinct substrings (the answer should be two), then see the code below (and online example here):

char *str = "This is a simple string";
char *what = "is";

int what_len = strlen(what);
int count = 0;

char *where = str;

if (what_len) 
    while ((where = strstr(where, what))) {
        where += what_len;
        count++;
    }

- Eldar Abusalimov

6

使用 KMP 算法，你可以在 O(n) 时间内完成它。

int fail[LEN+1];
char s[LEN];
void getfail()
{
    //f[i+1]= max({j|s[i-j+1,i]=s[0,j-1],j!=i+1})
    //the correctness can be proved by induction
    for(int i=0,j=fail[0]=-1;s[i];i++)
    {
        while(j>=0&&s[j]!=s[i]) j=fail[j];
        fail[i+1]=++j;
        if (s[i+1]==s[fail[i+1]]) fail[i+1]=fail[fail[i+1]];//optimizing fail[]
    }
}

int kmp(char *t)// String s is pattern and String t is text!
{
    int cnt=0;
    for(int i=0,j=0;t.s[i];i++)
    {
        while(j>=0&&t.s[i]!=s[j]) j=fail[j];
        if (!s[++j])
        {
            j=fail[j];
            cnt++;
        }
    }
    return cnt;// how many times s appeared in t.
}

- user2376234

2

根据是否允许重叠，结果可能会有所不同：

// gcc -std=c99
#include <stdbool.h>
#include <stdio.h>
#include <string.h>

static int
count_substr(const char *str, const char* substr, bool overlap) {
  if (strlen(substr) == 0) return -1; // forbid empty substr

  int count = 0;
  int increment = overlap ? 1 : strlen(substr);
  for (char* s = (char*)str; (s = strstr(s, substr)); s += increment)
    ++count;
  return count;
}

int main() {
  char *substrs[] = {"a", "aa", "aaa", "b", "", NULL };
  for (char** s = substrs; *s != NULL; ++s)
    printf("'%s' ->  %d, no overlap: %d\n", *s, count_substr("aaaaa", *s, true),
       count_substr("aaaaa", *s, false));
}

输出结果

'a' ->  5, no overlap: 5
'aa' ->  4, no overlap: 2
'aaa' ->  3, no overlap: 1
'b' ->  0, no overlap: 0
'' ->  -1, no overlap: -1

- jfs

0

假设s和substr都不为空且非空字符串：

/* #times substr appears in s, no overlaps */
int nappear(const char *s, const char *substr)
{
    int n = 0;
    const char *p = s;

    size_t lenSubstr = strlen(substr);

    while (*p) {
        if (memcmp(p, substr, lenSubstr) == 0) {
            ++n;
            p += lenSubstr;
        } else 
            ++p;
    }
    return n;
}

- lost_in_the_source

-2

/* 
 * C Program To Count the Occurence of a Substring in String 
 */
#include <stdio.h>
#include <string.h>

char str[100], sub[100];
int count = 0, count1 = 0;

void main()
{
    int i, j, l, l1, l2;

    printf("\nEnter a string : ");
    scanf("%[^\n]s", str);

    l1 = strlen(str);

    printf("\nEnter a substring : ");
    scanf(" %[^\n]s", sub);

    l2 = strlen(sub);

    for (i = 0; i < l1;)
    {
        j = 0;
        count = 0;
        while ((str[i] == sub[j]))
        {
            count++;
            i++;
            j++;
        }
        if (count == l2)
        {
            count1++;                                   
            count = 0;
        }
        else
            i++;
    }    
    printf("%s occurs %d times in %s", sub, count1, str);
}

- sankar

不要无缘无故使用全局变量。void main是错误的；应该是int main。"%[^\n]s"不能做你想要的事情；s不是%指令的一部分，需要输入一个字面上的s。你没有为输入指定上限；这是潜在的缓冲区溢出。如果必须使用scanf，请始终检查其返回值。不要使用scanf进行用户输入。strlen返回的是size_t而不是int。在while条件中有多余的括号；虽然不是一个错误，但如果你将==误写为=，这会使gcc给出警告。 - melpomene

while循环不会检查字符串的结尾并且如果所有字符匹配，它会超出str和sub的结尾。 j和count总是一起设置; 它们实际上是相同的变量。你的算法完全错误：例如它无法在"aab"中找到“ab”。 - melpomene

通常情况下，避免仅发布代码的答案。算法的描述或解释您的答案与其他答案的不同之处会有所帮助。 - melpomene

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Joachim Isaksson · Accepted Answer

你可以像这样做：

int countString(const char *haystack, const char *needle){
    int count = 0;
    const char *tmp = haystack;
    while(tmp = strstr(tmp, needle))
    {
        count++;
        tmp++;
    }
    return count;
}

也就是说，当你得到一个结果后，从字符串的下一个位置开始再次搜索。

strstr() 不仅可以从字符串的开头开始工作，还可以从任何位置开始。