"类权重=无"和"自动"在SVM Scikit Learn中有什么区别？"

Question

"类权重=无"和"自动"在SVM Scikit Learn中有什么区别？"

5

在scikit-learn的SVM分类器中，class_weight = None和class_weight = Auto有什么区别？

文档中给出了如下解释：

对于SVC，将类i的参数C设置为class_weight[i]*C。如果没有给出，则所有类都假定具有权重1。“auto”模式使用y的值自动调整权重，与类频率成反比。

class sklearn.svm.SVC(C=1.0, kernel='rbf', degree=3, gamma=0.0, coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, random_state=None)

但是使用自动模式有什么优势呢？我无法理解它的实现方式。

- dangerous

2个回答

6

这是一篇相当老的文章，但是对于那些刚遇到这个问题的人，请注意，class_weight == 'auto' 已经在0.17版本中被弃用。请改用class_weight == 'balanced'。

http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

这是实现的方式：

n_samples / (n_classes * np.bincount(y))

干杯！

- chakrr

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- IVlad · Accepted Answer

这发生在 class_weight.py 文件中：

elif class_weight == 'auto':
    # Find the weight of each class as present in y.
    le = LabelEncoder()
    y_ind = le.fit_transform(y)
    if not all(np.in1d(classes, le.classes_)):
        raise ValueError("classes should have valid labels that are in y")

    # inversely proportional to the number of samples in the class
    recip_freq = 1. / bincount(y_ind)
    weight = recip_freq[le.transform(classes)] / np.mean(recip_freq)

这意味着你拥有的每个类别（在classes中）都会获得一个权重，该权重等于y中该类别出现次数的倒数，因此出现频率较高的类别将获得较低的权重。然后，这个权重值还要除以所有逆类别频率的平均值。

优点是你不必再担心自己设置类别权重：对于大多数应用程序来说，这已经足够好了。

如果你查看源代码中的None，weight将填充为1，因此每个类别都获得相等的权重。