我有一个包含14个二元变量的数据集。我已经测试过每个单独变量的显著性,但我也想检查交互作用的显著性。然而,我知道高级别的交互作用不太可能显著,只会混淆模型。有没有办法在R中运行线性模型,但告诉它只测试最多3个变量之间的交互作用?
lm(y1 ~ .^3, anscombe[1:5])
提供:
Call:
lm(formula = y1 ~ .^3, data = anscombe[1:5])
Coefficients:
(Intercept) x1 x2 x3 x4 x1:x2
12.81992 -2.60371 NA NA -0.16258 0.36279
x1:x3 x1:x4 x2:x3 x2:x4 x3:x4 x1:x2:x3
NA NA NA NA NA -0.01345
x1:x2:x4 x1:x3:x4 x2:x3:x4
NA NA NA
combn
来生成特征的三元组组合。Comb <- combn(names(iris)[1:4],3)
输出
[,1] [,2] [,3] [,4]
[1,] "Sepal.Length" "Sepal.Length" "Sepal.Length" "Sepal.Width"
[2,] "Sepal.Width" "Sepal.Width" "Petal.Length" "Petal.Length"
[3,] "Petal.Length" "Petal.Width" "Petal.Width" "Petal.Width"
然后使用as.formula
手动定义公式,使用3个特征的组合。
ans <- apply(Comb, 2, function(x) glm(as.formula(paste0("Species ~ ", paste0(x, collapse=" + "))), data=iris, family=binomial()))
ans
输出
[[1]]
Call: glm(formula = as.formula(paste0("Species ~ ", paste0(x, collapse = " + "))),
family = binomial(), data = iris)
Coefficients:
(Intercept) Sepal.Length Sepal.Width Petal.Length
71.80 -23.91 -13.51 34.95
Degrees of Freedom: 149 Total (i.e. Null); 146 Residual
Null Deviance: 191
Residual Deviance: 3.523e-09 AIC: 8
[[2]]
Call: glm(formula = as.formula(paste0("Species ~ ", paste0(x, collapse = " + "))),
family = binomial(), data = iris)
Coefficients:
(Intercept) Sepal.Length Sepal.Width Petal.Width
-25.477 6.762 -19.057 59.292
Degrees of Freedom: 149 Total (i.e. Null); 146 Residual
Null Deviance: 191
Residual Deviance: 4.144e-09 AIC: 8
# etc
x1:x2
,y ~ 0.36(x1) + 0.36(x2)
?(顺便说一句,我给这个答案点了赞)。 - CPakanscombe
的列是数值型而不是因子。 - G. Grothendieck