使用purrr::map在自定义函数中从列表数据框中输入列参数

Question

使用purrr::map在自定义函数中从列表数据框中输入列参数

3

我正在编写一个自定义函数，它使用帮助为列表的每个元素执行线性混合效应模型。代码块可以完美地工作，但是当我将其转换为自定义函数时，不清楚如何输入与列表元素中的各个列对应的参数。

如果我让自定义函数工作，我可以将其用于任意数量的变量。否则，我将不得不为不同的变量复制和粘贴相同的代码。

# libraries needed
library(purrr)
library(lmerTest)
data(mtcars)

# create a list of dataframes from mtcars based on a split
group_list <- split(mtcars, mtcars$am)

# goal: to do linear mixed effects model for each dataframe and combining the results neatly in a dataframe

# achieving this outside of a custom function
group_list %>%
  purrr::map(.x = (.),
             .f = ~ lmerTest::lmer(
               scale(mpg) ~ scale(wt) + (wt | cyl),
               data = (.),
               REML = FALSE
             )) %>%
  purrr::map(.f = ~ coef(summary(.))[-c(1),]) %>%
  base::do.call(what = cbind.data.frame, args = .) %>%
  tibble::rownames_to_column(df = ., var = "Effect")
#>       Effect          0             1
#> 1   Estimate -0.3318711 -9.089148e-01
#> 2 Std. Error  0.2104268  1.156500e-01
#> 3         df  0.6084658  1.300000e+01
#> 4    t value -1.5771334 -7.859187e+00
#> 5   Pr(>|t|)  0.4558206  2.714599e-06

# preparing the custom function to do the same
lmer_group <- function(list, x, y) {
  list %>%
    purrr::map(
      .x = (.),
      .f = ~ lmerTest::lmer(
        scale(y) ~ scale(x) + (x | cyl),
        data = (.),
        REML = FALSE
      )
    ) %>%
    purrr::map(.f = ~ coef(summary(.))[-c(1),]) %>%
    base::do.call(what = cbind.data.frame, args = .) %>%
    tibble::rownames_to_column(df = ., var = "Effect")
}

# doing the same analysis with a custom function
lmer_group(list = group_list, x = wt, y = mpg) # attempt 1
#> Error in scale(y): object 'mpg' not found
lmer_group(list = group_list, x = 'wt', y = 'mpg') # attempt 2
#> Error in colMeans(x, na.rm = TRUE): 'x' must be numeric
lmer_group(
  list = group_list,
  x = lapply(group_list, `[`, 'wt'),
  y = lapply(group_list, `[`, 'mpg')
) # attempt 3
#> Error in colMeans(x, na.rm = TRUE): 'x' must be numeric

这段文字是由 reprex package（版本号为v0.1.1.9000）在2018年1月28日创建的。

- Indrajeet Patil

1

阅读 tidyverse 中关于函数内非标准评估的文档，特别是 rlang::quo() 和相关函数。另请参阅 https://dev59.com/bFcQ5IYBdhLWcg3wCvc4#44080671。 - wibeasley

2个回答

2

所有间接引用都在公式内部发生，所以我认为根本不需要 rlang。

您可以传递所需变量的字符串，并将它们粘合成 lmer 函数的字符串。然后使用 stats::as.formula() 将其转换为 lmer 可用的正确公式。

lmer_group <- function(l, x_name, y_name) {
  fx <- paste0("scale(", y_name, ") ~ scale(", x_name, ") + (", x_name," | cyl)")
  print(paste("Evaluating: ", fx))

  l %>% 
    purrr::map(
      .f = ~ lmerTest::lmer(
        as.formula(fx),
        data = (.),
        REML = FALSE
      )
    ) %>%
    purrr::map(.f = ~ coef(summary(.))[-c(1),]) %>%
    base::do.call(what = cbind.data.frame, args = .) %>%
    tibble::rownames_to_column(df = ., var = "Effect")
}

lmer_group(l = group_list, x = 'wt', y = 'mpg') # attempt 2

结果:

[1] "Evaluating:  scale(mpg) ~ scale(wt) + (wt | cyl)"
      Effect          0             1
1   Estimate -0.3318712 -9.089148e-01
2 Std. Error  0.2104267  1.156500e-01
3         df  0.6084632  1.300000e+01
4    t value -1.5771343 -7.859187e+00
5   Pr(>|t|)  0.4558213  2.714599e-06

我敢打赌，使用 rlang 的 quo() 方法可以解决这个问题。如果你采用这个解决方案，实际上它与 Formula with dynamic number of variables 是重复的。

- wibeasley

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- wibeasley · Accepted Answer

这是一种类似的方法，其结果是转置的。如果所有的 t 值都在同一列中，而不是在同一行中，那么我认为它会更有用。它使得查询和操作变得更容易。

lmer_group <- function(l, x_name, y_name) {
  fx <- glue::glue("scale({y_name}) ~ scale({x_name}) + ({x_name} | cyl)")
  cat(paste("Evaluating: ", fx, "\n"))

  filter_name  <- glue::glue("scale({x_name})")

  l %>% 
    purrr::map(
      .f = ~ lmerTest::lmer(
        as.formula(fx),
        data = (.),
        REML = FALSE
      )
    ) %>%
    purrr::map_dfr(.f = ~ broom::tidy(.), .id = "am") %>% 
    dplyr::filter(term==!!filter_name) %>% 
    dplyr::select(
      am, 
      estimate,
      std.error,
      t           = statistic
    )
}

lmer_group(l = group_list, x = 'wt', y = 'mpg') # attempt 2

df 和 p 值没有显示，因为我认为这并没有写入到 lme4 tidyer 中。这可能是一个无法解决的问题。

Evaluating:  scale(mpg) ~ scale(wt) + (wt | cyl)
  am   estimate std.error         t
1  0 -0.3318712 0.2104267 -1.577134
2  1 -0.9089148 0.1156500 -7.859187

为了增加变化，我使用了glue而不是paste0()。