在一个整数向量中找到出现最频繁的数字。

Question

在一个整数向量中找到出现最频繁的数字。

3

我正在做这道练习题，要求我编写一个函数，可以计算出数组中出现最多次数的数字。

例子输入如下：

x = c(25, 2, 3, 57, 38, 41)

"返回值为2、3、5，因为这些数字2、3和5都出现了2次，这是最多的。"

- Adam Young

5个回答

1

使用table()函数来获取每个数字的频率数据框（而不是通过for循环进行计数），然后按照频率排列该数据框并直接提取前三个数字的解决方案：

input_vector <- c(25, 2, 3, 57, 38, 41)

top_digits <- function(my_array, n=3) {
  
  # `as.character` converts the digits to strings, 
  # `strsplit` splits each one into individual characters (e.g. "23" into "2" and "3")
  # and `unlist` "flattens" the result to a unique string vector 
  my_array_splitted <- unlist(strsplit(as.character(input_vector), ""))
  
  # `table` creates a vector of frequencies
  # `as.data.frame` converts the vector into a DF with 2 columns: digits and frequencies
  df_digits <- as.data.frame(table(my_array_splitted))
  
  # Sorting the DF by frequency
  df_digits <- df_digits[order(df_digits$Freq, decreasing = TRUE),]
  
  # Extracting the first `n` elements of the digits column (which is now sorted) and converting back to integer
  # (we need the intermediate step as character because the column is originally factor, and converting directly to integer is unsafe
  as.integer(as.character(df_digits$my_array_splitted[1:n]))
}

- Francisco Yirá

我认为你需要将第一个函数参数命名为“input_vector”。 - user438383

@user438383 不需要，因为在声明函数时参数的名称只是用于函数内部。但是这个例子缺少调用函数：top_digits(input_vector)。这将返回输入的前三位数字。 - Francisco Yirá

1

这种方法类似并使用了table

count = function(x) {
    # make a table of counts of all the digits
    tab = table(strsplit(paste(x, collapse=""), ""))
    # access the names of the last digits
    names(tab[max(tab)])
}

因为现在是圣诞节，所以我们来做一个有趣的基准测试：

x = sample(1:1000, 100000, replace=T)

Unit: milliseconds
                expr       min        lq      mean    median        uq      max
               me(x)  46.63262  52.34020  57.33796  53.87266  58.91561 123.5481
             anou(x) 319.14199 351.43877 381.35371 374.78037 408.67354 490.3464
 digit_occurrence(x) 149.83663 151.61908 160.47220 156.88108 161.57646 245.5067
       top_digits(x)  42.40598  49.92426  55.87991  51.90813  56.61563 109.5608

- user438383

0

这可能是您的另一个选择：

fn <- function(x) {
  # First We separate every single digit in each element but we need to turn
  # the each element into character string beforehand. We then use do.call 
  # function to apply c function on every element of the resulting list to 
  # flatten the list to a vector
  digits <- do.call(c, sapply(x, function(y) strsplit(as.character(y), "")))
  
  # In the end we calculate the frequencies and sort the in decreasing order
  most_freq <- sort(table(digits), decreasing = TRUE)
  most_freq
}

fn(x)

digits_num
2 3 5 1 4 7 8 
2 2 2 1 1 1 1

- Anoushiravan R

1

这是我在此发布问题后想到的方法。将数值转换为字符串，分离它们，然后将它们重新转换为数值，再找出频率。但这不太有效率。问题要求执行时间<=5秒。 - Adam Young

如果是这种情况，您可以省略转换回数字的部分，代码仍将运行。 - Anoushiravan R

0

另一种方法是将您的数字向量转换为字符向量，拆分整个字符串，然后制作频率表：

table(unlist(strsplit(as.character(x), ""))) -> t

# 1 2 3 4 5 7 8 
# 1 2 2 1 2 1 1

如果您想提取出现频率最高的数字：

as.integer(names(t[t == which.max(t)])) 
#2 3 5

- AlexB

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Dion Groothof · Accepted Answer

一种方法可能是这样的，尽管我确信还有更高效的方法：

my_vector <- c(25, 2, 3, 57, 38, 41)

# function to evaluate the number of times a certain digit occurrs
digit_occurrence <- function(vector) {
  
  # collape vector to a single string without commas
  x <- paste(vector, sep = '', collapse = '')

  # create empty vector
  digit <- c()

  # loop over each unique digit and store its occurrence
  for(i in paste(as.character(0:9))) {
    digit[i] <- lengths(regmatches(x, gregexpr(i, x)))
  }

  digit

}

> digit_occurrence(my_vector)
0 1 2 3 4 5 6 7 8 9 
0 1 2 2 1 2 0 1 1 0