LightGBM: Sklearn和Native API的等效性

7
我正在通过Training API http://lightgbm.readthedocs.io/en/latest/Python-API.html#training-api和Scikit-learn API http://lightgbm.readthedocs.io/en/latest/Python-API.html#scikit-learn-api来尝试LightGBM。但是,我无法在以下示例中清晰地映射两个API之间的关系。基本想法是使用50%的合成数据集进行训练。
import numpy as np
import lightgbm as lgbm

# Generate Data Set
xs = np.linspace(0, 10, 100).reshape((-1, 1)) 
ys = xs**2 + 4*xs + 5.2
ys = ys.reshape((-1,))

# LGBM configuration
alg_conf = {
    "num_boost_round":25,
    "max_depth" : 3,
    "num_leaves" : 31,
    'learning_rate' : 0.1,
    'boosting_type' : 'gbdt',
    'objective' : 'regression_l2',
    "early_stopping_rounds": None,
}

# Calling Regressor using scikit-learn API 
sk_reg = lgbm.sklearn.LGBMRegressor(
    num_leaves=alg_conf["num_leaves"], 
    n_estimators=alg_conf["num_boost_round"], 
    max_depth=alg_conf["max_depth"],
    learning_rate=alg_conf["learning_rate"],
    objective=alg_conf["objective"]
)
sk_reg.fit(xs[::2], ys[::2])

print("Scikit-learn API results")
print(sk_reg.predict(xs[1::2]))


# Calling Regressor using native API 
train_dataset = lgbm.Dataset(xs[::2], ys[::2])
lg_reg = lgbm.train(alg_conf.copy(), train_dataset)

print("Native API results")
print(lg_reg.predict(xs[1::2]))

输出

Scikit-learn API results
[  14.35693851   14.35693851   14.35693851   14.35693851   14.35693851
   14.35693851   14.35693851   14.35693851   14.35693851   14.35693851
   25.37944751   25.37944751   25.37944751   25.37944751   25.37944751
   35.10572544   35.10572544   35.10572544   35.10572544   35.10572544
   46.50667974   46.50667974   46.50667974   46.50667974   46.50667974
   59.44952419   59.44952419   59.44952419   59.44952419   59.44952419
   75.42846332   75.42846332   75.42846332   75.42846332   75.42846332
  109.4610814   109.4610814   109.4610814   109.4610814   109.4610814
  109.4610814   109.4610814   109.4610814   109.4610814   109.4610814
  109.4610814   109.4610814   109.4610814   109.4610814   109.4610814 ]
Native API results
[ 22.55947971  22.55947971  22.55947971  22.55947971  22.55947971
  22.55947971  22.55947971  22.55947971  22.55947971  22.55947971
  22.55947971  22.55947971  22.55947971  22.55947971  22.55947971
  22.55947971  22.55947971  22.55947971  22.55947971  22.55947971
  45.33537795  45.33537795  45.33537795  45.33537795  45.33537795
  91.6376959   91.6376959   91.6376959   91.6376959   91.6376959
  91.6376959   91.6376959   91.6376959   91.6376959   91.6376959
  91.6376959   91.6376959   91.6376959   91.6376959   91.6376959
  91.6376959   91.6376959   91.6376959   91.6376959   91.6376959
  91.6376959   91.6376959   91.6376959   91.6376959   91.6376959 ]

问题

我在哪里可以找到两个API参数之间的明确等价关系?

非常感谢。

1个回答

5

我在LightGBM的GitHub上找到了答案。以下是分享的结果:

添加 alg_conf "min_child_weight": 1e-3, "min_child_samples": 20) 可以解决差异问题:

import numpy as np
import lightgbm as lgbm

# Generate Data Set
xs = np.linspace(0, 10, 100).reshape((-1, 1)) 
ys = xs**2 + 4*xs + 5.2
ys = ys.reshape((-1,))

# Or you could add to your alg_conf "min_child_weight": 1e-3, "min_child_samples": 20.

# LGBM configuration
alg_conf = {
    "num_boost_round":25,
    "max_depth" : 3,
    "num_leaves" : 31,
    'learning_rate' : 0.1,
    'boosting_type' : 'gbdt',
    'objective' : 'regression_l2',
    "early_stopping_rounds": None,
    "min_child_weight": 1e-3, 
    "min_child_samples": 20
}

# Calling Regressor using scikit-learn API 
sk_reg = lgbm.sklearn.LGBMRegressor(
    num_leaves=alg_conf["num_leaves"], 
    n_estimators=alg_conf["num_boost_round"], 
    max_depth=alg_conf["max_depth"],
    learning_rate=alg_conf["learning_rate"],
    objective=alg_conf["objective"],
    min_sum_hessian_in_leaf=alg_conf["min_child_weight"],
    min_data_in_leaf=alg_conf["min_child_samples"]
)
sk_reg.fit(xs[::2], ys[::2])

print("Scikit-learn API results")
print(sk_reg.predict(xs[1::2]))


# Calling Regressor using native API 
train_dataset = lgbm.Dataset(xs[::2], ys[::2])
lg_reg = lgbm.train(alg_conf.copy(), train_dataset)

print("Native API results")
print(lg_reg.predict(xs[1::2]))

它运行良好。


在Lightgbm Scikit learn api中,我们可以使用print(sk_reg)来获取lightgbm模型/参数。你知道如何在原生api中实现这个功能吗?使用print(lg_reg)将返回对booster对象的引用。 - M Hendra Herviawan
为什么在更改了两个参数之后,这两个API的结果相似了,而之前不是呢? - Sift

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接