我正在尝试使用glmnet
和onehot
软件包运行岭回归/套索回归,并且遇到了错误。
library(glmnet)
library(onehot)
set.seed(123)
Sample <- HouseData[1:1460, ]
smp_size <- floor(0.5 * nrow(Sample))
train_ind <- sample(seq_len(nrow(Sample)), size = smp_size)
train <- Sample[train_ind, ]
test <- Sample[-train_ind, ]
############Ridge & Lasso Regressions ################
# Define the response for the training + test set
y_train <- train$SalePrice
y_test <- test$SalePrice
# Define the x training and test
x_train <- train[,!names(train)=="SalePrice"]
x_test <- test[,!names(train)=="SalePrice"]
str(y_train)
## encoding information for training set
x_train_encoded_data_info <- onehot(x_train,stringsAsFactors = TRUE, max_levels = 50)
x_train_matrix <- (predict(x_train_encoded_data_info,x_train))
x_train_matrix <- as.matrix(x_train_matrix)
# create encoding information for x test
x_test_encoded_data_info <- onehot(x_test,stringsAsFactors = TRUE, max_levels = 50)
x_test_matrix <- (predict(x_test_encoded_data_info,x_test))
str(x_train_matrix)
###Calculate best lambda
cv.out <- cv.glmnet(x_train_matrix, y_train,
alpha = 0, nlambda = 100,
lambda.min.ratio = 0.0001)
best.lambda <- cv.out$lambda.min
best.lambda
model <- glmnet(x_train_matrix, y_train, alpha = 0, lambda = best.lambda)
results_ridge <- predict(model,newx=x_test_matrix)
我知道我的数据很干净,矩阵大小也相同,但是当我尝试运行预测时,仍然会出现这个错误。
错误信息:在选择函数“as.matrix”的方法时评估参数“x”时发生错误: Cholmod错误“X和/或Y的维度错误”,文件../MatrixOps/cholmod_sdmult.c,行90
我的教授还告诉我在拆分数据之前要进行独热编码,但这对我来说毫无意义。