为什么在tidymodels中使用ranger函数得到的模型与直接调用ranger函数得到的模型不同?

3
我想知道为什么在使用tidymodels中的ranger和直接使用ranger时,我得到的模型不一样?
这里有一个可复现的例子:
library(tidymodels)
library(ranger)

# load data
data("iris")
train <- iris |> slice_sample(prop = 0.7)
test <- iris |> anti_join(train)

# rf model specs
rf_mod <- 
  rand_forest(trees = 10)  |>  
  set_engine("ranger", respect.unordered.factors = TRUE, probability = FALSE) |> 
  set_mode("classification")

# fit model using tidymodels
set.seed(100)
rf_mod |> fit(Species ~ ., data = train) # OOB=4.81%

# fit model using ranger directly
set.seed(100)
ranger(Species ~ ., data = train, 
       num.trees=10, respect.unordered.factors = TRUE, probability = FALSE) # OOB=5.77%
1个回答

5
你得到不同的模型,是因为没有设置seed参数。如果你为两种方式设置相同的种子,你将得到相同的模型拟合。
library(tidymodels)
library(ranger)

# load data
data("iris")
train <- iris |> slice_sample(prop = 0.7)
test <- iris |> anti_join(train)
#> Joining with `by = join_by(Sepal.Length, Sepal.Width, Petal.Length,
#> Petal.Width, Species)`

# rf model specs
rf_mod <- 
  rand_forest(trees = 10)  |>  
  set_engine("ranger", respect.unordered.factors = TRUE, probability = FALSE, 
             seed = 1234) |> 
  set_mode("classification")

# fit model using tidymodels
set.seed(100)
rf_mod |> fit(Species ~ ., data = train)
#> parsnip model object
#> 
#> Ranger result
#> 
#> Call:
#>  ranger::ranger(x = maybe_data_frame(x), y = y, num.trees = ~10,      respect.unordered.factors = ~TRUE, probability = ~FALSE,      seed = ~1234, num.threads = 1, verbose = FALSE) 
#> 
#> Type:                             Classification 
#> Number of trees:                  10 
#> Sample size:                      105 
#> Number of independent variables:  4 
#> Mtry:                             2 
#> Target node size:                 1 
#> Variable importance mode:         none 
#> Splitrule:                        gini 
#> OOB prediction error:             2.88 %

# fit model using ranger directly
set.seed(100)
ranger(Species ~ ., data = train, 
       num.trees=10, respect.unordered.factors = TRUE, probability = FALSE, 
       seed = 1234)
#> Ranger result
#> 
#> Call:
#>  ranger(Species ~ ., data = train, num.trees = 10, respect.unordered.factors = TRUE,      probability = FALSE, seed = 1234) 
#> 
#> Type:                             Classification 
#> Number of trees:                  10 
#> Sample size:                      105 
#> Number of independent variables:  4 
#> Mtry:                             2 
#> Target node size:                 1 
#> Variable importance mode:         none 
#> Splitrule:                        gini 
#> OOB prediction error:             2.88 %

2023-11-15创建,使用reprex v2.0.2生成


谢谢。我没意识到ranger()函数中的seed参数在tidymodels选项中被使用了 :-) - undefined

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接