如何延迟评估作为参数传递给purrr::pmap的函数

3

我正在尝试使用嵌套数据框(https://r4ds.had.co.nz/many-models.html)方法,使用lcmm::lcmm()purrr::pmap()来拟合多个潜在类成长曲线。

这个过程需要使用lcmm()来拟合一个单一类别的模型(k = 1),然后将该模型作为输入传递给lcmm::gridsearch(),该函数使用来自k = 1模型的起始值来喂入一个k = 2+类别的模型。 gridsearch()还需要k = 2+模型的模型调用(加上另外两个参数),这些参数作为对gridsearch()中的lcmm()的调用传递。我的常规方法是使用pmap()将参数列表传递给gridsearch(),但list()会立即评估对lcmm()的模型调用,并尝试拟合模型,而不是将模型调用传递给gridsearch()(参见purrr::pmap与rlang的混乱行为; "引用"或不引用参数是Q)。
NB:使用RStudio的函数查看器(F2),似乎lcmm :: gridsearch()使用match.call()来调整带有用户定义数量的随机起始值的k = 2+模型调用,然后迭代这些值以找到首选的k = 2+解决方案。
我在下面包含了一个reprex。当将对gridsearch的调用包装在pmap中时,该命令会失败,并显示“Error in mutate_impl(.data,dots):Evaluation error:argument is of length zero。”-我认为这是因为R正在尝试评估对lcmm()的调用,用于k = 2+模型,但我可能是错的。
如何延迟将lcmm()作为参数传递给pmap()的评估?
下面是Reprex:
library(lcmm)
#> Warning: package 'lcmm' was built under R version 3.5.2
#> Loading required package: survival
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(purrr)

# load lcmm example data

data("data_lcmm")

# take sample

set.seed(123)

data_lcmm <-
  data_lcmm %>%
  sample_frac(0.1)




# NB grouping variable is needed to reproduce desired data structure 

data_lcmm <-
  data_lcmm %>%
  mutate(group_var = sample(c(0, 1),
    size = nrow(data_lcmm),
    replace = TRUE
  ))



data_lcmm_nest <-
  data_lcmm %>%
  group_by(group_var) %>%
  nest() %>% 
  mutate(data= map(data, as.data.frame))


# lcmm call from ?lcmm

lcmm_k1 <- function(df) {
  lcmm(Ydep2 ~ Time + I(Time^2),
    random = ~Time, subject = "ID", ng = 1,
    data = data_lcmm_nest$data[[1]], link = "linear"
  )
}


# fit k = 1 models
data_lcmm_nest <-
  data_lcmm_nest %>%
  mutate(lcgm = map(data, lcmm_k1))
#> Be patient, lcmm is running ... 
#> The program took 0.18 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.19 seconds

# this works for a single row
desired_result <-
  gridsearch(
    m = lcmm(Ydep2 ~ Time + I(Time^2),
      mixture = ~Time,
      random = ~Time, subject = "ID", ng = 2,
      data = data_lcmm_nest$data[[1]], link = "linear"
    ),
    rep = 5,
    maxiter = 2,
    minit = data_lcmm_nest$lcgm[[1]]
  )
#> Be patient, lcmm is running ... 
#> The program took 0.45 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.45 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.45 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.45 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.47 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.61 seconds


# this fails with Error in mutate_impl(.data, dots) :
# Evaluation error: argument is of length zero.

data_lcmm_nest %>%
  mutate(lcgm_2 = pmap(
    list(
      m = lcmm(Ydep2 ~ Time + I(Time^2),
        mixture = ~Time,
        random = ~Time, subject = "ID", ng = 2,
        data = data, link = "linear"
      ),
      rep = 5,
      maxiter = 2,
      minit = lcgm
    ), gridsearch
  ))
#> Error in mutate_impl(.data, dots): Evaluation error: argument is of length zero.


# wrapping gridsearch in helper also fails

grid_search_helper <- function(g_rep, g_maxiter, g_minit, g_m) {
  gridsearch(
    m = lcmm(Ydep2 ~ Time + I(Time^2),
      mixture = ~Time,
      random = ~Time, subject = "ID", ng = 2,
      data = g_m, link = "linear"
    ),
    rep = g_rep,
    maxiter = g_maxiter,
    minit = g_minit
  )
}


data_lcmm_nest %>%
  mutate(lcgm_2 = pmap(
    list(
      5,
      2,
      lcgm,
      data
    ), grid_search_helper
  ))
#> Error in mutate_impl(.data, dots): Evaluation error: object 'g_m' not found.

reprex package(v0.2.1)于2019年01月24日创建

2个回答

1

使用 purrr,我相信以下代码可以创建您需要的输出结果,即拟合模型对象的列表。

它通过在匿名函数中使用 ..n 语法引用从 data_lcmm_nest 提供给 purrr 的参数来工作,在函数之前加上 ~,其中 n 指的是提供的数据框或列表中参数的位置。

library(lcmm)
#> Warning: package 'lcmm' was built under R version 4.0.5
#> Loading required package: survival
#> Loading required package: parallel
#> Loading required package: mvtnorm
#> Loading required package: randtoolbox
#> Loading required package: rngWELL
#> Warning: package 'rngWELL' was built under R version 4.0.5
#> This is randtoolbox. For an overview, type 'help("randtoolbox")'.
#> 
#> Attaching package: 'lcmm'
#> The following object is masked from 'package:randtoolbox':
#> 
#>     permut
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(purrr)

# load lcmm example data

data("data_lcmm")

# take sample

set.seed(123)

data_lcmm <-
  data_lcmm %>%
  sample_frac(0.1)

# NB grouping variable is needed to reproduce desired data structure 

data_lcmm <-
  data_lcmm %>%
  mutate(group_var = sample(c(0, 1),
                            size = nrow(data_lcmm),
                            replace = TRUE
  ))

data_lcmm_nest <-
  data_lcmm %>%
  group_by(group_var) %>%
  nest() %>% 
  mutate(data= map(data, as.data.frame))


# lcmm call from ?lcmm

lcmm_k1 <- function(df) {
  lcmm(Ydep2 ~ Time + I(Time^2),
       random = ~Time, subject = "ID", ng = 1,
       data = data_lcmm_nest$data[[1]], link = "linear"
  )
}

# fit k = 1 models
data_lcmm_nest <-
  data_lcmm_nest %>%
  mutate(lcgm = map(data, lcmm_k1))
#> Be patient, lcmm is running ... 
#> The program took 0.18 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.17 seconds

# this works for n rows
desired_result_list <- pmap(
    data_lcmm_nest,
    ~ gridsearch(
      m = lcmm(Ydep2 ~ Time + I(Time^2),
               mixture = ~Time,
               random = ~Time, subject = "ID", ng = 2,
               data = ..2, link = "linear"
      ),
      rep = 5,
      maxiter = 2,
      minit = ..3
    )
  )
#> Be patient, lcmm is running ... 
#> The program took 0.38 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.41 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.41 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.43 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.44 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.46 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.33 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.33 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.31 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.31 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.31 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.37 seconds
Created on 2022-04-28 by the reprex package (v2.0.0)

0

这并不完全回答了我的原始问题,因为它没有使用purrr,但是使用for循环进行迭代不会有这个延迟评估的问题:

library(lcmm)
#> Loading required package: survival
#> Loading required package: parallel
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(purrr)


data("data_lcmm")

# take sample

set.seed(123)

data_lcmm <-
  data_lcmm %>%
  sample_frac(0.1)




# NB grouping variable is needed to reproduce desired data structure 

data_lcmm <-
  data_lcmm %>%
  mutate(group_var = sample(c(0, 1),
                            size = nrow(data_lcmm),
                            replace = TRUE
  ))



data_lcmm_nest <-
  data_lcmm %>%
  group_by(group_var) %>%
  nest() %>% 
  mutate(data= map(data, as.data.frame))



# lcmm call from ?lcmm

lcmm_k1 <- function(df) {
  lcmm(Ydep2 ~ Time + I(Time^2),
       random = ~Time, subject = "ID", ng = 1,
       data = data_lcmm_nest$data[[1]], link = "linear"
  )
}


# fit k = 1 models
data_lcmm_nest <-
  data_lcmm_nest %>%
  mutate(lcgm = map(data, lcmm_k1))
#> Be patient, lcmm is running ... 
#> The program took 0.19 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.22 seconds

# set-up output vector

results <- vector(mode = "list", length = nrow(data_lcmm_nest))

# fit models

for(i in 1:nrow(data_lcmm_nest)){
  
  results[[i]] <- gridsearch(
    m = lcmm(Ydep2 ~ Time + I(Time^2),
             mixture = ~Time,
             random = ~Time, subject = "ID", ng = 2,
             data = data_lcmm_nest$data[[i]], link = "linear"
    ),
    rep = 5,
    maxiter = 2,
    minit = data_lcmm_nest$lcgm[[i]]
  )
}
#> Be patient, lcmm is running ... 
#> The program took 0.56 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.42 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.47 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.48 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.52 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.5 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.33 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.32 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.39 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.38 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.37 seconds 
#> Be patient, lcmm is running ... 
#> The program took 0.47 seconds

data_lcmm_nest <- 
data_lcmm_nest %>% 
  ungroup() %>% 
  mutate(res = results)

本文档由reprex package (v0.3.0)于2021-04-20创建

devtools::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 4.0.3 (2020-10-10)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United Kingdom.1252 
#>  ctype    English_United Kingdom.1252 
#>  tz       Europe/London               
#>  date     2021-04-20                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version date       lib source        
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.3)
#>  callr         3.5.1   2020-10-13 [1] CRAN (R 4.0.3)
#>  cli           2.2.0   2020-11-20 [1] CRAN (R 4.0.3)
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 4.0.3)
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 4.0.3)
#>  devtools      2.3.2   2020-09-18 [1] CRAN (R 4.0.3)
#>  digest        0.6.27  2020-10-24 [1] CRAN (R 4.0.3)
#>  dplyr       * 1.0.2   2020-08-18 [1] CRAN (R 4.0.3)
#>  ellipsis      0.3.1   2020-05-15 [1] CRAN (R 4.0.3)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.0.3)
#>  fansi         0.4.1   2020-01-08 [1] CRAN (R 4.0.3)
#>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.0.3)
#>  generics      0.1.0   2020-10-31 [1] CRAN (R 4.0.3)
#>  glue          1.4.2   2020-08-27 [1] CRAN (R 4.0.3)
#>  highr         0.8     2019-03-20 [1] CRAN (R 4.0.3)
#>  htmltools     0.5.0   2020-06-16 [1] CRAN (R 4.0.3)
#>  knitr         1.30    2020-09-22 [1] CRAN (R 4.0.3)
#>  lattice       0.20-41 2020-04-02 [2] CRAN (R 4.0.3)
#>  lcmm        * 1.9.2   2020-07-07 [1] CRAN (R 4.0.3)
#>  lifecycle     0.2.0   2020-03-06 [1] CRAN (R 4.0.3)
#>  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.0.3)
#>  Matrix        1.2-18  2019-11-27 [2] CRAN (R 4.0.3)
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 4.0.3)
#>  pillar        1.4.7   2020-11-20 [1] CRAN (R 4.0.3)
#>  pkgbuild      1.2.0   2020-12-15 [1] CRAN (R 4.0.3)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.0.3)
#>  pkgload       1.1.0   2020-05-29 [1] CRAN (R 4.0.3)
#>  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.0.3)
#>  processx      3.4.5   2020-11-30 [1] CRAN (R 4.0.3)
#>  ps            1.5.0   2020-12-05 [1] CRAN (R 4.0.3)
#>  purrr       * 0.3.4   2020-04-17 [1] CRAN (R 4.0.3)
#>  R6            2.5.0   2020-10-28 [1] CRAN (R 4.0.3)
#>  remotes       2.2.0   2020-07-21 [1] CRAN (R 4.0.3)
#>  rlang         0.4.10  2020-12-30 [1] CRAN (R 4.0.3)
#>  rmarkdown     2.6     2020-12-14 [1] CRAN (R 4.0.3)
#>  rprojroot     2.0.2   2020-11-15 [1] CRAN (R 4.0.3)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.3)
#>  stringi       1.5.3   2020-09-09 [1] CRAN (R 4.0.3)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.0.3)
#>  survival    * 3.2-7   2020-09-28 [1] CRAN (R 4.0.3)
#>  testthat      3.0.1   2020-12-17 [1] CRAN (R 4.0.3)
#>  tibble        3.0.4   2020-10-12 [1] CRAN (R 4.0.3)
#>  tidyr       * 1.1.2   2020-08-27 [1] CRAN (R 4.0.3)
#>  tidyselect    1.1.0   2020-05-11 [1] CRAN (R 4.0.3)
#>  usethis       2.0.0   2020-12-10 [1] CRAN (R 4.0.3)
#>  vctrs         0.3.6   2020-12-17 [1] CRAN (R 4.0.3)
#>  withr         2.3.0   2020-09-22 [1] CRAN (R 4.0.3)
#>  xfun          0.20    2021-01-06 [1] CRAN (R 4.0.3)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.0.3)
#> 
#> [1] M:/R/win-library/3.6
#> [2] C:/Program Files/R/R-4.0.3/library

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接