在R中计算单词子集的出现次数？

Question

在R中计算单词子集的出现次数？

3

在R中，假设我有一个类似以下的字符串列表：

str_list <- list("corn is food", "corn is good")

如果我想要计算在某个词汇子集（如"corn"和"food"）中每个单词在每个元素中出现的次数，有没有办法可以做到这一点？例如，基于str_list，我希望得到一个向量[2, 1]，它计算了第一个元素中food（1次）和corn（1次）的数量，以及第二个元素中corn（1次）的数量。我不想只计算一个单词如"corn"的次数，这可以通过stringr::str_count()函数来实现。

- James Rider

str_count函数属于哪个包？ - Edward

4个回答

4

你可以按照以下方式解决你的问题：

colSums(sapply(words, stringi::stri_count_fixed, str=str_list))
# corn food 
#    2    1

# or 
stringi::stri_count_fixed(paste0(str_list, collapse=" "), words)
# [1] 2 1

数据

str_list <- list("corn is food", "corn is good")
words <- c("corn", "food")

- B. Christian Kamgang

确实是一个令人惊叹的解决方案 - Onyambu

3

如果我正确理解您的需求，下面的代码应该能解决它，尽管我们在其中使用了str_count：

library(stringr)

str_list <- list("corn is food", "corn is good")
word_list <- c("corn", "food")

count_words <- function(string, words) {
  sum(sapply(words, function(word) str_count(string, word)))
}

result <- sapply(str_list, count_words, word_list)

输出所需的向量：

> print(result)
[1] 2 1

- Marc

你需要修复一些东西！请尝试这个例子：str_list <- list("corn is good", "corn is good")。有2个玉米和0个食物，所以正确的输出应该是2 0，但是你的代码得到的是1 1，对吗？ - Darren Tsai

你需要修正一些东西！请尝试这个示例：str_list <- list("corn is good", "corn is good")。有2个玉米和0个食物，所以正确的输出应该是2 0，但你的代码得到了1 1，对吗？ - Darren Tsai

2

你可以尝试以下的编程方法：strsplit + table。

> table(unlist(strsplit(unlist(str_list), "\\W+")))[word_list]

corn food
   2    1

- ThomasIsCoding

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Darren Tsai · Accepted Answer

使用 base R，您可以使用 sapply + grep + lengths：

lengths(sapply(words, grep, str_list))

# corn food 
#    2    1

更新

正如@Onyambu所指出的那样，如果一个单词在一句话中重复出现，grep将无法捕获到重复的部分。通过将grep()替换为gregexpr()来进行修订。

sapply(words, \(x) sum(gregexpr(x, toString(str_list))[[1]] > 0))

使用stringr::str_count()的等效解决方案：

colSums(sapply(words, stringr::str_count, string = str_list))

数据

str_list <- list("corn is food", "corn is good")
words <- c("corn", "food")