Gist
错误信息: Error in predmat[which, seq(nlami)] = preds : replacement has length zero
上下文: 数据使用二进制y进行模拟,但是有n
个真实y的编码器。数据被叠加n
次,并拟合模型,试图获得true y
。
该错误出现在以下情况:
L2
惩罚,但不包括L1
惩罚。- 当Y为编码器Y时,而不是真实Y时。
- 该错误不是确定性的,而是取决于种子。
更新:错误出现在1.9-8版本以后。1.9-8不会出错。
复现过程
基础数据:
library(glmnet)
rm(list=ls())
set.seed(123)
num_obs=4000
n_coders=2
precision=.8
X <- matrix(rnorm(num_obs*20, sd=1), nrow=num_obs)
prob1 <- plogis(X %*% c(2, -2, 1, -1, rep(0, 16))) # yes many zeros, ignore
y_true <- rbinom(num_obs, 1, prob1)
dat <- data.frame(y_true = y_true, X = X)
创建编码人员
classify <- function(true_y,precision){
n=length(true_y)
y_coder <- numeric(n)
y_coder[which(true_y==1)] <- rbinom(n=length(which(true_y==1)),
size=1,prob=precision)
y_coder[which(true_y==0)] <- rbinom(n=length(which(true_y==0)),
size=1,prob=(1-precision))
return(y_coder)
}
y_codings <- sapply(rep(precision,n_coders),classify,true_y = dat$y_true)
堆叠一切
expanded_data <- do.call(rbind,rep(list(dat),n_coders))
expanded_data$y_codings <- matrix(y_codings, ncol = 1)
重现错误
由于该错误依赖于种子,因此需要循环。仅第一次循环会失败,其他两次都会成功完成。
X <- as.matrix(expanded_data[,grep("X",names(expanded_data))])
for (i in 1:1000) cv.glmnet(x = X,y = expanded_data$y_codings,
family="binomial", alpha=0) # will fail
for (i in 1:1000) cv.glmnet(x = X,y = expanded_data$y_codings,
family="binomial", alpha=1) # will not fail
for (i in 1:1000) cv.glmnet(x = X,y = expanded_data$y_true,
family="binomial", alpha=0) # will not fail
任何想法,glmnet中这是从哪里来的,如何避免它?从我的阅读cv.glmnet,这是在cv例程之后,并且在cvstuff = do.call(fun, list(outlist, lambda, x, y, weights, offset, foldid, type.measure, grouped, keep))内部,我不理解它的作用,因此失败了,如何避免它。
会话(Ubuntu和PC)
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] glmnet_2.0-2 foreach_1.4.3 Matrix_1.2-7.1 devtools_1.12.0
loaded via a namespace (and not attached):
[1] httr_1.2.1 R6_2.2.0 tools_3.3.1 withr_1.0.2 curl_2.1
[6] memoise_1.0.0 codetools_0.2-15 grid_3.3.1 iterators_1.0.8 knitr_1.14
[11] digest_0.6.10 lattice_0.20-34
并且
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] glmnet_2.0-2 foreach_1.4.3 Matrix_1.2-7.1 devtools_1.12.0
loaded via a namespace (and not attached):
[1] httr_1.2.1 R6_2.2.0 tools_3.3.1 withr_1.0.2 curl_2.1
[6] memoise_1.0.0 codetools_0.2-15 grid_3.3.1 iterators_1.0.8 digest_0.6.10
[11] lattice_0.20-34
glmnet_2.0-5
也遇到了相同的错误。正如评论中提到的那样(https://github.com/lmweber/glmnet-error-example/blob/master/glmnet_error_example.R),在逐步执行代码后,问题出在`mlami`大于所有`lambda`值。这个bug已经被告知给`glmnet`的开发人员了吗? - rwolst