LightGBM
预测类别概率的内部细节。其他软件包,如
sklearn
,为它们的分类器提供了详细的说明。例如:
概率估计。
所有类别的返回估计值都按类别标签排序。
对于多类问题,如果将multi_class设置为“multinomial”,则使用softmax函数查找每个类别的预测概率。否则,使用一对多方法,即使用逻辑函数计算假定为正的每个类别的概率,并在所有类别上规范化这些值。
RandomForest
返回:
还有其他Stack Overflow问题提供了额外的细节,例如: 我试图揭示LightGBM的预测X的类别概率。
输入样本的预测类别概率计算为森林中树的平均预测类别概率。单棵树的类别概率是叶子中相同类别样本的比例。
predict_proba
函数的相同细节。文档没有列出如何计算概率的详细信息。文档仅说明:
以下是源代码:返回每个样本的每个类别的预测概率。
def predict_proba(self, X, raw_score=False, start_iteration=0, num_iteration=None,
pred_leaf=False, pred_contrib=False, **kwargs):
"""Return the predicted probability for each class for each sample.
Parameters
----------
X : array-like or sparse matrix of shape = [n_samples, n_features]
Input features matrix.
raw_score : bool, optional (default=False)
Whether to predict raw scores.
start_iteration : int, optional (default=0)
Start index of the iteration to predict.
If <= 0, starts from the first iteration.
num_iteration : int or None, optional (default=None)
Total number of iterations used in the prediction.
If None, if the best iteration exists and start_iteration <= 0, the best iteration is used;
otherwise, all iterations from ``start_iteration`` are used (no limits).
If <= 0, all iterations from ``start_iteration`` are used (no limits).
pred_leaf : bool, optional (default=False)
Whether to predict leaf index.
pred_contrib : bool, optional (default=False)
Whether to predict feature contributions.
.. note::
If you want to get more explanations for your model's predictions using SHAP values,
like SHAP interaction values,
you can install the shap package (https://github.com/slundberg/shap).
Note that unlike the shap package, with ``pred_contrib`` we return a matrix with an extra
column, where the last column is the expected value.
**kwargs
Other parameters for the prediction.
Returns
-------
predicted_probability : array-like of shape = [n_samples, n_classes]
The predicted probability for each class for each sample.
X_leaves : array-like of shape = [n_samples, n_trees * n_classes]
If ``pred_leaf=True``, the predicted leaf of every tree for each sample.
X_SHAP_values : array-like of shape = [n_samples, (n_features + 1) * n_classes] or list with n_classes length of such objects
If ``pred_contrib=True``, the feature contributions for each sample.
"""
result = super(LGBMClassifier, self).predict(X, raw_score, start_iteration, num_iteration,
pred_leaf, pred_contrib, **kwargs)
if callable(self._objective) and not (raw_score or pred_leaf or pred_contrib):
warnings.warn("Cannot compute class probabilities or labels "
"due to the usage of customized objective function.\n"
"Returning raw scores instead.")
return result
elif self._n_classes > 2 or raw_score or pred_leaf or pred_contrib:
return result
else:
return np.vstack((1. - result, result)).transpose()
我该如何理解LightGBM
的predict_proba
函数的内部工作原理?
C++
或C
编写的。目的是要有一个描述如何计算概率的答案;不一定是逻辑或代码流程。有几个问题的示例提供了类似于我上面寻找的答案。在我看来,最好的是SVM[1] [https://dev59.com/rGUp5IYBdhLWcg3wtZE0]。 - artemis