将数据框列表传递给lm()并查看结果

3

我有三个数据框,dfLON、 dfMOS 和 dfATA。它们都具有相同的变量: y 是一个连续变量,a、b 和 c 是二元分类变量,并且还有一些 NA

我想为每个数据集构建单独的线性回归模型。

通过我的当前代码,我已经成功创建了一个数据框列表,并将其传递给 lm()。但是有没有更简洁的方法来查看结果,比如 fitdfLON <- DfList[[1]]?我在这个例子中提供了三个数据框,但实际上我有约 25 个,所以我需要输入 25 次!

任何帮助将不胜感激。

起始点(dfs):

dfLON <- data.frame(y=c(1.23,2.32,3.21,2.43),a=c(1,NA,1,2),b=c(1,1,2,2),c=c(2,1,2,1))
dfMOS <- data.frame(y=c(4.56,6.54,4.43,5.78),a=c(2,1,2,1),b=c(2,1,1,2),c=c(1,2,1,2))
dfATA <- data.frame(y=c(1.22,6.54,3.23,4.23),a=c(2,2,2,1),b=c(1,2,1,2),c=c(1,NA,1,2))

当前代码:

Mylm <- function(df){
 fit <- lm(y ~ a + b + c, data=df)
  return(fit)
}
DfList <- lapply(list(dfLON, dfMOS, dfATA), Mylm)

fitdfLON <- DfList[[1]]
fitdfMOS <- DfList[[2]]
fitdfATA <- DfList[[3]]

只需使用命名列表而不是独立对象:DfList <- setNames(lapply(list(dfLON, dfMOS, dfATA), Mylm), c("fitdfLON", "fitdfMOS", fitdfATA")); 然后引用项目:DfList$fitdfLON; DfList$fitdfMOS; DfList$fitdfATA - Parfait
3个回答

1
如果数据框的名称有共同的模式,您可以使用mgetls的组合来提取它们并使用lapply运行lm
fit = lapply(mget(ls(pattern = "^df[A-Z]{3}")), function(x) lm(y ~ a + b + c, data = x))
fit$dfATA

#Call:
#lm(formula = y ~ a + b + c, data = x)

#Coefficients:
#(Intercept)            a            b            c  
#      6.235       -2.005           NA           NA  

如果您只需要所有系数,可以这样做:
do.call(rbind,
        lapply(X = mget(ls(pattern = "^df[A-Z]{3}")),
               FUN = function(x) lm(formula = y ~ a + b + c, data = x)[[1]]))
#      (Intercept)      a      b  c
#dfATA      6.2350 -2.005     NA NA
#dfLON      0.0300 -0.780  1.980 NA
#dfMOS      8.2975 -1.665 -0.315 NA

你可以提供一个包含所有data.frame名称的向量来代替ls(pattern = "df[A-Z]{3}")


1
每当您在许多不同的数据集上运行模型时,使用broom库整理它们是有意义的。这将为每个模型产生一个干净的数据框,您可以将其输出或在下游分析中使用。
最简单的例子:
library(broom)

Mylm <- function(df){
  fit <- lm(y ~ a + b + c, data=df)
  tidy(fit) # tidy the fit object
}

list(dfLON, dfMOS, dfATA) %>% lapply(Mylm)

#[[1]]
#         term estimate std.error statistic p.value
#1 (Intercept)     0.03       NaN       NaN     NaN
#2           a    -0.78       NaN       NaN     NaN
#3           b     1.98       NaN       NaN     NaN
#
#[[2]]
#         term estimate std.error  statistic    p.value
#1 (Intercept)   8.2975  0.969855  8.5554025 0.07407531
#2           a  -1.6650  0.445000 -3.7415730 0.16626155
#3           b  -0.3150  0.445000 -0.7078652 0.60785169
#
#[[3]]
#         term estimate std.error statistic   p.value
#1 (Intercept)    6.235  3.015000  2.067993 0.2867398
#2           a   -2.005  1.740711 -1.151828 0.4551559

现在,您可以使用purrr中的map_dfr()函数将所有内容组合成一个组合数据框:

library(purrr)

# note the named list entries; these will go into the "model" column
# without them, you'd just get a model number
list("LON" = dfLON, "MOS" = dfMOS, "ATA" = dfATA) %>% 
  map_dfr(Mylm, .id = "model")

#  model        term estimate std.error  statistic    p.value
#1   LON (Intercept)   0.0300       NaN        NaN        NaN
#2   LON           a  -0.7800       NaN        NaN        NaN
#3   LON           b   1.9800       NaN        NaN        NaN
#4   MOS (Intercept)   8.2975  0.969855  8.5554025 0.07407531
#5   MOS           a  -1.6650  0.445000 -3.7415730 0.16626155
#6   MOS           b  -0.3150  0.445000 -0.7078652 0.60785169
#7   ATA (Intercept)   6.2350  3.015000  2.0679934 0.28673976
#8   ATA           a  -2.0050  1.740711 -1.1518281 0.45515586

为了使事情更加紧凑,你可以在 map_dfr 内部即时定义函数。当你所做的只是拟合一个线性模型时,这似乎很合适。

list("LON" = dfLON, "MOS" = dfMOS, "ATA" = dfATA) %>% 
  map_dfr(~ tidy(lm(y ~ a + b + c, data = .)),
          .id = "model")

#  model        term estimate std.error  statistic    p.value
#1   LON (Intercept)   0.0300       NaN        NaN        NaN
#2   LON           a  -0.7800       NaN        NaN        NaN
#3   LON           b   1.9800       NaN        NaN        NaN
#4   MOS (Intercept)   8.2975  0.969855  8.5554025 0.07407531
#5   MOS           a  -1.6650  0.445000 -3.7415730 0.16626155
#6   MOS           b  -0.3150  0.445000 -0.7078652 0.60785169
#7   ATA (Intercept)   6.2350  3.015000  2.0679934 0.28673976
#8   ATA           a  -2.0050  1.740711 -1.1518281 0.45515586

0
#make a list of all the dataframes
df = list(dfATA = dfATA, dfLON =dfLON, dfMOS = dfMOS)

#fitting the model
lmr = lapply(df, function(x){
  lmr = lm(x$y ~ x$a + x$b+ x$c, x)
})

#Get coefficients for each model
coefficients = lapply(lmr, function(x) x[["coefficients"]])
coefficients = unlist(coefficients)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接