如何获取字符串中出现最频繁的字符？

Question

3

假设下一个字符字符串：

test_string <- "A A B B C C C H I"

有没有办法从test_string中提取最常见的值？

像这样：

extract_most_frequent_character(test_string)

输出：

#C

- AlSub

4个回答

2

我们可以使用scan函数将字符串按空格分割成单个元素的向量，再利用table函数获得各元素的频率计数，返回计数最大的元素的名称（which.count），并获取该名称name。

extract_most_frequent_character <- function(x) {
     names(which.max(table(scan(text = x, what = '', quiet = TRUE))))
}

-测试

extract_most_frequent_character(test_string)
[1] "C"

使用strsplit函数：

extract_most_frequent_character <- function(x) {
     names(which.max(table(unlist(strsplit(x, "\\s+")))))
}

- akrun

2

这里有另一种基于R语言的选项（不像@akrun的回答那样优雅）。

> intToUtf8(names(which.max(table(utf8ToInt(gsub("\\s", "", test_string))))))
[1] "C"

- ThomasIsCoding

2

这里提供一种使用 stringr 包、table 和 which 的解决方案：

library(stringr)
test_string <- str_split(test_string, " ")

test_string <- table(test_string)

names(test_string)[which.max(test_string)]

[1] "C"

- TarJae

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- tmfmnk · Accepted Answer

使用 stringr 可能的一种方式是：

names(which.max(table(str_extract_all(test_string, "[A-Z]", simplify = TRUE))))

[1] "C"

或略微缩短：

names(which.max(table(str_extract_all(test_string, "[A-Z]")[[1]])))