如何统计字符串中每个单词出现的次数?

3
我使用以下代码从字符串输入中提取单词,如何获取每个单词的出现次数?
var words = Regex.Split(input, @"\W+")
                        .AsEnumerable()
                        .GroupBy(w => w)
                        .Where(g => g.Count() > 10)
                        .Select(g => g.Key);
4个回答

5

你可以使用string.Split代替Regex.Split,并获得每个单词的计数,例如:

string str = "Some string with Some string repeated";
var result  = str.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries)
                .GroupBy(r => r)
                .Select(grp => new
                    {
                        Word = grp.Key,
                        Count = grp.Count()
                    });

如果您想过滤掉那些至少重复10次的单词,那么您可以在Select之前添加一个条件,如Where(grp=> grp.Count >= 10) 输出结果为:
foreach (var item in result)
{
    Console.WriteLine("Word: {0}, Count:{1}", item.Word, item.Count);
}

输出:

Word: Some, Count:2
Word: string, Count:2
Word: with, Count:1
Word: repeated, Count:1

对于不区分大小写的分组,您可以将当前的 GroupBy 替换为:

.GroupBy(r => r, StringComparer.InvariantCultureIgnoreCase)

所以你的查询应该是:
var result = str.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries)
                .GroupBy(r => r, StringComparer.InvariantCultureIgnoreCase)
                .Where(grp => grp.Count() >= 10)
                .Select(grp => new
                    {
                        Word = grp.Key,
                        Count = grp.Count()
                    });

2

试试这个:

var words = Regex.Split(input, @"\W+")
                        .AsEnumerable()
                        .GroupBy(w => w)
                        .Select(g => new {key = g.Key, count = g.Count()});

1

移除Select语句,保留IGrouping,你可以使用它来查看键和计数值。

var words = Regex.Split(input, @"\W+")
                    .AsEnumerable()
                    .GroupBy(w => w)
                    .Where(g => g.Count() > 10);

foreach (var wordGrouping in words)
{
    var word = wordGrouping.Key;
    var count = wordGrouping.Count();
}

0
你可以像这样创建一个字典:
var words = Regex.Split(input, @"\W+")
                 .GroupBy(w => w)
                 .Select(g => g.Count() > 10)
                 .ToDictionary(g => g.Key, g => g.Count());

或者,如果您想避免两次计算数量,可以这样做:

var words = Regex.Split(input, @"\W+")
                 .GroupBy(w => w)
                 .Select(g => new { g.Key, Count = g.Count() })
                 .Where(g => g.Count > 10)
                 .ToDictionary(g => g.Key, g => g.Count);

现在你可以像这样获取单词的计数(假设单词“foo”在input中出现了超过10次):
var fooCount = words["foo"];

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接