在sklearn逻辑回归中，如何应用class_weights？

Question

在sklearn逻辑回归中，如何应用class_weights？

pythonscikit-learnlogistic-regression

6

我对sklearn如何应用我们提供的类别权重很感兴趣。文档没有明确说明类别权重在哪里和如何应用。阅读源代码也没有帮助（似乎sklearn.svm.liblinear用于优化，但我无法阅读源代码因为它是一个.pyd文件...）

但我猜它是在代价函数上起作用：当指定类别权重时，相应类别的代价将乘以类别权重。例如，如果我有2个观测值，每个观测值来自类别0（权重=0.5）和类别1（权重=1），则代价函数将为：

成本= 0.5 * log（...X_0，y_0 ...）+ 1 * log（...X_1，y_1 ...）+惩罚

有人知道这是否正确吗？

- lizardfireman

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- MaxU - stand with Ukraine · Accepted Answer

请检查源代码中以下行：

le = LabelEncoder()
if isinstance(class_weight, dict) or multi_class == 'multinomial':
    class_weight_ = compute_class_weight(class_weight, classes, y)
    sample_weight *= class_weight_[le.fit_transform(y)]

这里是compute_class_weight()函数的源代码:

...
else:
    # user-defined dictionary
    weight = np.ones(classes.shape[0], dtype=np.float64, order='C')
    if not isinstance(class_weight, dict):
        raise ValueError("class_weight must be dict, 'balanced', or None,"
                         " got: %r" % class_weight)
    for c in class_weight:
        i = np.searchsorted(classes, c)
        if i >= len(classes) or classes[i] != c:
            raise ValueError("Class label {} not present.".format(c))
        else:
            weight[i] = class_weight[c]
...

在上面的代码片段中，class_weight被应用于sample_weight，后者在一些内部函数中使用，例如_logistic_loss_and_grad，_logistic_loss等。

# Logistic loss is the negative of the log of the logistic function.
out = -np.sum(sample_weight * log_logistic(yz)) + .5 * alpha * np.dot(w, w)
# NOTE: --->  ^^^^^^^^^^^^^^^