mlr中警告“NA作为学习器参数缺失的默认值使用”的含义是什么？

Question

mlr中警告“NA作为学习器参数缺失的默认值使用”的含义是什么？

3

我正在使用mlr包运行分类xgboost。我的数据中有缺失值，我希望保留这些观测，并避免插值。我理解mlr中的xgboost实现可以处理缺失值。但是，我不理解mlr的makeLearner函数提供的警告。

我尝试阅读文档并在其他人的代码中找到了这个警告。但是，我没有看到解释这个警告的方法让我感到有道理。

例如，我已经阅读了警告的讨论，但它没有为我澄清问题： https://github.com/mlr-org/mlr/pull/1225 当调用makeLearner函数时，警告出现：

xgb_learner <- makeLearner(
  "classif.xgboost",
  predict.type = "prob",
  par.vals = list(
    objective = "binary:logistic",
    eval_metric = "error",
    nrounds = 200,
    missing = NA,
    max_depth = 6,
    eta = 0.1,
    gamma = 5,
    colsample_bytree = 0.5,
    min_child_weight = 1,
    subsample = 0.7

  )
)
Warning in makeParam(id = id, type = "numeric", learner.param = TRUE, lower = lower,  :
  NA used as a default value for learner parameter missing.
ParamHelpers uses NA as a special value for dependent parameters.

我的缺失值当前被编码为缺失值（即NA）。从以下内容可以清楚地看出R将其识别为缺失值：

> sum(is.na(training$day))
[1] 58

从getParamSet函数中可以看出，参数missing取值范围为从负无穷到正无穷的数字。因此，也许NA不是一个有效的值？

> getParamSet("classif.xgboost")
Warning in makeParam(id = id, type = "numeric", learner.param = TRUE, lower = lower,  :
  NA used as a default value for learner parameter missing.
ParamHelpers uses NA as a special value for dependent parameters.
                                Type  len             Def               Constr Req Tunable Trafo
booster                     discrete    -          gbtree gbtree,gblinear,dart   -    TRUE     -
watchlist                    untyped    -          <NULL>                    -   -   FALSE     -
eta                          numeric    -             0.3               0 to 1   -    TRUE     -
gamma                        numeric    -               0             0 to Inf   -    TRUE     -
max_depth                    integer    -               6             1 to Inf   -    TRUE     -
min_child_weight             numeric    -               1             0 to Inf   -    TRUE     -
subsample                    numeric    -               1               0 to 1   -    TRUE     -
colsample_bytree             numeric    -               1               0 to 1   -    TRUE     -
colsample_bylevel            numeric    -               1               0 to 1   -    TRUE     -
num_parallel_tree            integer    -               1             1 to Inf   -    TRUE     -
lambda                       numeric    -               1             0 to Inf   -    TRUE     -
lambda_bias                  numeric    -               0             0 to Inf   -    TRUE     -
alpha                        numeric    -               0             0 to Inf   -    TRUE     -
objective                    untyped    - binary:logistic                    -   -   FALSE     -
eval_metric                  untyped    -           error                    -   -   FALSE     -
base_score                   numeric    -             0.5          -Inf to Inf   -   FALSE     -
max_delta_step               numeric    -               0             0 to Inf   -    TRUE     -
missing                      numeric    -                          -Inf to Inf   -   FALSE     -

我需要将这些内容重新编码为特定值，然后通过missing = [specific value]在makeLearner中传递给mlr吗？还是需要采取其他措施？或者这个警告不需要担心？

非常感谢您的任何澄清。

- PBB

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Lars Kotthoff · Accepted Answer

3

这个警告是来自 ParamHelpers，但在这种情况下是无害的。这是一个标准检查，没有考虑特定的情况。

- Lars Kotthoff

1

您介意详细说明它试图告诉我的内容吗？在什么情况下会引起关注？ - PBB

1

在许多情况下，NA 可能并不是一个明智的默认值，而只是因为代码作者不知道更好的选择才添加的。但这里不是这种情况。即使如此，你也可以争论是否真的需要这个检查。 - Lars Kotthoff

也许是 ParamHelpers 处理 xgboost 的 missing 参数的方式，其默认值为 NA。我尝试将其设置为其他值，但警告仍然存在。 - xm1