LightGBM: Sklearn和Native API的等效性

Question

LightGBM: Sklearn和Native API的等效性

7

我正在通过Training API http://lightgbm.readthedocs.io/en/latest/Python-API.html#training-api和Scikit-learn API http://lightgbm.readthedocs.io/en/latest/Python-API.html#scikit-learn-api来尝试LightGBM。但是，我无法在以下示例中清晰地映射两个API之间的关系。基本想法是使用50%的合成数据集进行训练。

import numpy as np
import lightgbm as lgbm

# Generate Data Set
xs = np.linspace(0, 10, 100).reshape((-1, 1)) 
ys = xs**2 + 4*xs + 5.2
ys = ys.reshape((-1,))

# LGBM configuration
alg_conf = {
    "num_boost_round":25,
    "max_depth" : 3,
    "num_leaves" : 31,
    'learning_rate' : 0.1,
    'boosting_type' : 'gbdt',
    'objective' : 'regression_l2',
    "early_stopping_rounds": None,
}

# Calling Regressor using scikit-learn API 
sk_reg = lgbm.sklearn.LGBMRegressor(
    num_leaves=alg_conf["num_leaves"], 
    n_estimators=alg_conf["num_boost_round"], 
    max_depth=alg_conf["max_depth"],
    learning_rate=alg_conf["learning_rate"],
    objective=alg_conf["objective"]
)
sk_reg.fit(xs[::2], ys[::2])

print("Scikit-learn API results")
print(sk_reg.predict(xs[1::2]))


# Calling Regressor using native API 
train_dataset = lgbm.Dataset(xs[::2], ys[::2])
lg_reg = lgbm.train(alg_conf.copy(), train_dataset)

print("Native API results")
print(lg_reg.predict(xs[1::2]))

输出

Scikit-learn API results
[  14.35693851   14.35693851   14.35693851   14.35693851   14.35693851
   14.35693851   14.35693851   14.35693851   14.35693851   14.35693851
   25.37944751   25.37944751   25.37944751   25.37944751   25.37944751
   35.10572544   35.10572544   35.10572544   35.10572544   35.10572544
   46.50667974   46.50667974   46.50667974   46.50667974   46.50667974
   59.44952419   59.44952419   59.44952419   59.44952419   59.44952419
   75.42846332   75.42846332   75.42846332   75.42846332   75.42846332
  109.4610814   109.4610814   109.4610814   109.4610814   109.4610814
  109.4610814   109.4610814   109.4610814   109.4610814   109.4610814
  109.4610814   109.4610814   109.4610814   109.4610814   109.4610814 ]
Native API results
[ 22.55947971  22.55947971  22.55947971  22.55947971  22.55947971
  22.55947971  22.55947971  22.55947971  22.55947971  22.55947971
  22.55947971  22.55947971  22.55947971  22.55947971  22.55947971
  22.55947971  22.55947971  22.55947971  22.55947971  22.55947971
  45.33537795  45.33537795  45.33537795  45.33537795  45.33537795
  91.6376959   91.6376959   91.6376959   91.6376959   91.6376959
  91.6376959   91.6376959   91.6376959   91.6376959   91.6376959
  91.6376959   91.6376959   91.6376959   91.6376959   91.6376959
  91.6376959   91.6376959   91.6376959   91.6376959   91.6376959
  91.6376959   91.6376959   91.6376959   91.6376959   91.6376959 ]

问题

我在哪里可以找到两个API参数之间的明确等价关系？

非常感谢。

- dokteurwho

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- dokteurwho · Accepted Answer

我在LightGBM的GitHub上找到了答案。以下是分享的结果：

添加 alg_conf "min_child_weight": 1e-3, "min_child_samples": 20) 可以解决差异问题：

import numpy as np
import lightgbm as lgbm

# Generate Data Set
xs = np.linspace(0, 10, 100).reshape((-1, 1)) 
ys = xs**2 + 4*xs + 5.2
ys = ys.reshape((-1,))

# Or you could add to your alg_conf "min_child_weight": 1e-3, "min_child_samples": 20.

# LGBM configuration
alg_conf = {
    "num_boost_round":25,
    "max_depth" : 3,
    "num_leaves" : 31,
    'learning_rate' : 0.1,
    'boosting_type' : 'gbdt',
    'objective' : 'regression_l2',
    "early_stopping_rounds": None,
    "min_child_weight": 1e-3, 
    "min_child_samples": 20
}

# Calling Regressor using scikit-learn API 
sk_reg = lgbm.sklearn.LGBMRegressor(
    num_leaves=alg_conf["num_leaves"], 
    n_estimators=alg_conf["num_boost_round"], 
    max_depth=alg_conf["max_depth"],
    learning_rate=alg_conf["learning_rate"],
    objective=alg_conf["objective"],
    min_sum_hessian_in_leaf=alg_conf["min_child_weight"],
    min_data_in_leaf=alg_conf["min_child_samples"]
)
sk_reg.fit(xs[::2], ys[::2])

print("Scikit-learn API results")
print(sk_reg.predict(xs[1::2]))


# Calling Regressor using native API 
train_dataset = lgbm.Dataset(xs[::2], ys[::2])
lg_reg = lgbm.train(alg_conf.copy(), train_dataset)

print("Native API results")
print(lg_reg.predict(xs[1::2]))

它运行良好。