如何在R包purrr中使用函数重现循环

Question

如何在R包purrr中使用函数重现循环

3

我经常在我的代码中使用循环。有人告诉我，与其使用循环，我应该使用函数，并且可以使用R包purr中的函数重写循环。

例如，该代码仅显示iris数据集中Sepal.Width＜3的不同物种的计数。

 library(dplyr)
 #dataframe to put the output in
 sepaltable <- data.frame(Species=character(),
                     Total=numeric(), 
                     stringsAsFactors=FALSE) 

 #list of species to iterate over
 specieslist<-unique(iris$Species)

 #loop to populate the dataframe with the name of the species 
 #and the count of how many there were in the iris dataset

 for (i in  seq_along (specieslist)){
 a<-paste(specieslist[i])  
 b<- filter(iris,`Species`==a & Sepal.Width <=3)
 c<-nrow(b)
 sepaltable[i,"Species"]<-a
 sepaltable[i,"Total"]<-c
 }

循环将物种的名称和它们在鸢尾花数据集中的数量填充到sepaltable数据帧中。我想使用R包purrr中的函数复制此循环的效果，而不使用循环。有人能帮忙吗？

- Basil

2个回答

2

对于您提供的类型示例，akrun的回答是最直接的方法，特别是由于您已经在使用dplyr。 dplyr包编写用于处理基本数据表摘要，特别是您示例中使用的组统计。
但是，在更复杂的情况下，大多数情况下您会编写一个循环，您可以使用函数和apply系列完成相同的操作。

使用您的示例：

# write function that does the stuff you put in your loop
summSpecies <- function(a) {
      b<- filter(iris,`Species`==a & Sepal.Width <=3)
      c<-nrow(b)
      return(c)
}

# apply the loop over your list
sapply(specieslist,summSpecies) #sapply simplifies the output to return a vector (in this case)
#[1]  8 42 33

# You can build this into a data frame
sepaltable <- data.frame(Species=specieslist,
                         Total=sapply(specieslist,summSpecies), 
                         stringsAsFactors=FALSE) 
sepaltable
#      Species Total
# 1     setosa     8
# 2 versicolor    42
# 3  virginica    33

就我所知，我对示例中提出的方法进行了比较：

Unit: microseconds
#            expr      min        lq     mean   median        uq       max neval
#      ForLoop.OP 2548.519 2725.9020 3107.153 2819.837 3006.5915 11654.194   100
#     Apply.Brian 2385.638 2534.2390 2810.854 2625.050 2822.5145  9641.172   100
#     dplyr.akrun 721.136  837.6065 1180.244  864.604  902.9815 13440.076   100
#     purrr.akrun 3572.656 3783.2845 4147.900 3874.095 4073.5690 10517.602   100
#    purrr.Axeman 2440.973 2527.322 2866.7686 2586.8960 2774.097  9577.360   100

毫不意外，针对这种任务进行优化的现有函数是最佳选择。使用 for 循环方法会落后于 apply 函数族的方法。

- Brian Fisher

1

如果您想使用purrr，请将sapply替换为map_int。 - Axeman

在循环示例中，首先生成空数据框，然后通过循环填充。是否可以使用函数完成相同的操作，即函数填充空数据框，而不必在最后将结果合并为向量？ - Basil

@Basil，你可以先创建一个空的数据框，然后用sapply(specieslist,summSpecies)填充它。这样做无论sepaltable是否有名为Total的列都可以实现。你也可以使用数据框的一列作为输入（即sepaltable$Total<- sapply(sepaltable$Species,summSpecies)）。 - Brian Fisher

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- akrun · Accepted Answer

我们可以在 dplyr 中使用逻辑表达式的 sum 进行分组

library(dplyr)
iris %>% 
   group_by(Species) %>%
   summarise(Total = sum(Sepal.Width <=3))

或者需要使用purrr库

library(purrr)
map_dfr(specieslist,  ~iris %>% 
      summarise(Total = sum(Species == .x & Sepal.Width <=3),
          Species = .x )) %>%
   select(Species, Total)

注意： map 或 apply 函数家族（lapply/sapply/vapply/rapply/mapply/Map/apply）都是循环结构。