在caret的train函数中,PCA预处理参数是什么?

4
我正在对我的数据进行 knn 回归,并希望:
a) 通过 repeatedcv 进行交叉验证以找到最优的 k
b) 在构建 knn 模型时,使用 PCA90% 阈值水平上降低维度。
library(caret)
library(dplyr)
set.seed(0)
data = cbind(rnorm(20, 100, 10), matrix(rnorm(400, 10, 5), ncol = 20)) %>% 
  data.frame()
colnames(data) = c('True', paste0('Day',1:20))
tr = data[1:15, ] #training set
tt = data[16:20,] #test set

train.control = trainControl(method = "repeatedcv", number = 5, repeats=3)
k = train(True ~ .,
          method     = "knn",
          tuneGrid   = expand.grid(k = 1:10), 
          #trying to find the optimal k from 1:10
          trControl  = train.control, 
          preProcess = c('scale','pca'),
          metric     = "RMSE",
          data       = tr)

我的问题: (1)我注意到有人建议在trainControl中更改pca参数:someone
ctrl <- trainControl(preProcOptions = list(thresh = 0.8))
mod <- train(Class ~ ., data = Sonar, method = "pls",
              trControl = ctrl)

如果我更改trainControl中的参数,那么这是否意味着在KNN期间仍会进行PCA?类似于此问题的担忧

(2) 我找到了另一个示例,适合我的情况 - 我希望将阈值更改为90%,但我不知道在Carettrain函数中该如何更改,特别是我仍然需要使用scale选项。

对于我冗长的描述和随机引用,我深感歉意。谢谢您提前!

(感谢Camille的建议使代码正常工作!)


1
我对caret没有太多的经验,但看起来preProcess应该是train的一个参数,而不是一个函数。将preProcess(c('scale','pca'))更改为preProcess = c('scale','pca') - camille
1个回答

1

回答你的问题:

我注意到有人建议在trainControl中更改pca参数:

mod <- train(Class ~ ., data = Sonar, method = "pls",trControl = ctrl)

如果我在trainControl中更改参数,这是否意味着在KNN期间仍会进行PCA?
是的,如果您使用以下方式进行更改:
train.control = trainControl(method = "repeatedcv", number = 5, repeats=3,preProcOptions = list(thresh = 0.9))

k = train(True ~ .,
          method     = "knn",
          tuneGrid   = expand.grid(k = 1:10), 
          trControl  = train.control, 
          preProcess = c('scale','pca'),
          metric     = "RMSE",
          data       = tr)

您可以在preProcess下进行检查:
k$preProcess
Created from 15 samples and 20 variables

Pre-processing:
  - centered (20)
  - ignored (0)
  - principal component signal extraction (20)
  - scaled (20)

PCA needed 9 components to capture 90 percent of the variance

这将回答第二个问题,即单独使用preProcess的方法:
mdl = preProcess(tr[,-1],method=c("scale","pca"),thresh=0.9)
mdl
Created from 15 samples and 20 variables

Pre-processing:
  - centered (20)
  - ignored (0)
  - principal component signal extraction (20)
  - scaled (20)

PCA needed 9 components to capture 90 percent of the variance

train.control = trainControl(method = "repeatedcv", number = 5, repeats=3)

k = train(True ~ .,
          method     = "knn",
          tuneGrid   = expand.grid(k = 1:10), 
          trControl  = train.control,
          metric     = "RMSE",
          data       = predict(mdl,tr))

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接