Scikit Learn SVC决策函数和预测

Question

Scikit Learn SVC决策函数和预测

81

我正试图理解SVC的实例方法decision_function和predict之间的关系(http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html)。目前为止，我了解到decision_function返回类之间的成对分数。我原本以为predict会选择最大化其成对分数的类别，但当我测试时得到了不同的结果。以下是我用来尝试理解两者关系的代码。首先我生成了成对分数矩阵，然后打印出具有最大成对分数的类别，这与clf.predict预测的类别不同。

        result = clf.decision_function(vector)[0]
        counter = 0
        num_classes = len(clf.classes_)
        pairwise_scores = np.zeros((num_classes, num_classes))
        for r in xrange(num_classes):
            for j in xrange(r + 1, num_classes):
                pairwise_scores[r][j] = result[counter]
                pairwise_scores[j][r] = -result[counter]
                counter += 1

        index = np.argmax(pairwise_scores)
        class = index_star / num_classes
        print class
        print clf.predict(vector)[0]

有人知道这些predict和decision_function之间的关系吗？

- Peter Tseng

1

“decision function returns pairwise scores between classes” 是不正确的。根据 decision_function 部分文档页面上的描述，应该是每个类别的得分：“样本 X 到分离超平面的距离”。请注意修改。 - justhalf

9

@justhalf说：不，楼主是正确的。sklearn.svm.SVC默认使用一对一的方式进行分解，并返回每个样本到所有n(n-1)/2条超平面的距离。 - Fred Foo

1

哎呀，是的，我记得在某个地方读到过。但是被文档误导了。对不起！ - justhalf

在尝试回答后，我认为bcorso的答案应该是正确的。实际上，这种关系是基于他从C++实现中翻译的代码：“decision = decision_function(params, sv, nv, a, b, X); votes = [(i if decision[p] > 0 else j) for p,(i,j) in enumerate((i,j) for i in range(len(cs)) for j in range(i+1,len(cs)))]”。votes中最高的投票基本上就是predict所做的事情。 - Eric Platon

有人能提供 decision_function() 官方文档的链接吗？我在网上找不到。 - Katya

6个回答

31

对于那些感兴趣的人，我将发布一个快速示例，展示从C++（这里）翻译到Python的predict函数：

# I've only implemented the linear and rbf kernels
def kernel(params, sv, X):
    if params.kernel == 'linear':
        return [np.dot(vi, X) for vi in sv]
    elif params.kernel == 'rbf':
        return [math.exp(-params.gamma * np.dot(vi - X, vi - X)) for vi in sv]

# This replicates clf.decision_function(X)
def decision_function(params, sv, nv, a, b, X):
    # calculate the kernels
    k = kernel(params, sv, X)

    # define the start and end index for support vectors for each class
    start = [sum(nv[:i]) for i in range(len(nv))]
    end = [start[i] + nv[i] for i in range(len(nv))]

    # calculate: sum(a_p * k(x_p, x)) between every 2 classes
    c = [ sum(a[ i ][p] * k[p] for p in range(start[j], end[j])) +
          sum(a[j-1][p] * k[p] for p in range(start[i], end[i]))
                for i in range(len(nv)) for j in range(i+1,len(nv))]

    # add the intercept
    return [sum(x) for x in zip(c, b)]

# This replicates clf.predict(X)
def predict(params, sv, nv, a, b, cs, X):
    ''' params = model parameters
        sv = support vectors
        nv = # of support vectors per class
        a  = dual coefficients
        b  = intercepts 
        cs = list of class names
        X  = feature to predict       
    '''
    decision = decision_function(params, sv, nv, a, b, X)
    votes = [(i if decision[p] > 0 else j) for p,(i,j) in enumerate((i,j) 
                                           for i in range(len(cs))
                                           for j in range(i+1,len(cs)))]

    return cs[max(set(votes), key=votes.count)]

虽然在调用 predict（X） 时，predict 和 decision_function 都有很多输入参数，但请注意这些参数都是在模型内部使用的。实际上，在拟合后，您可以从模型内部访问所有参数：

# Create model
clf = svm.SVC(gamma=0.001, C=100.)

# Fit model using features, X, and labels, Y.
clf.fit(X, y)

# Get parameters from model
params = clf.get_params()
sv = clf.support_vectors_ #added missing underscore
nv = clf.n_support_
#a  = clf.dual_coef_
a  = clf._dual_coef_ #use complementary dual coefficients
b  = clf._intercept_
cs = clf.classes_

# Use the functions to predict
print(predict(params, sv, nv, a, b, cs, X))

# Compare with the builtin predict
print(clf.predict(X))

- bcorso

1

嘿！谢谢你的回答。不过，我尝试了你的解决方案，结果不同... - lilouch

嗨，bcorso！感谢您的回答，但正如@lilouch所指出的那样，我无法获得相同的值。决策函数被描述为$\langle \mathbf{w},\mathbf{x} \rangle + b$，这个值必须大于1表示正类，小于-1表示负类。问题是我不知道如何在新示例和超平面向量之间进行点积运算。你能帮我吗？ - vhcandido

1

sklearn似乎有两个互补的双重系数和截距，将a = clf.dual_coef_更改为a = clf._dual_coef_，则decision_function的输出与clf._decision_function相同，并且predict的结果也与clf.predict一致。 - TurtleIzzy

我得到了一个错误 ValueError: shapes (60000,784) and (60000,784) not aligned: 784 (dim 1) != 60000 (dim 0)，将 return [math.exp(-params.gamma * np.dot(vi - X, vi - X)) for vi in sv] 更改为 return [np.exp(-gamma * np.linalg.norm(vi - X)) for vi in sv]，并且 kernel = clf.kernel，以及 gamma = clf._gamma 可能会正常工作。 - Riko

23

有一个非常好的关于多类别一对一场景的问答在datascience.sx上：

问题

I have a multiclass SVM classifier with labels 'A', 'B', 'C', 'D'.

This is the code I'm running:
>>>print clf.predict([predict_this])
['A']
>>>print clf.decision_function([predict_this])
[[ 185.23220833   43.62763596  180.83305074  -93.58628288   62.51448055  173.43335293]]
How can I use the output of decision function to predict the class (A/B/C/D) with the highest probability and if possible, it's value? I have visited https://dev59.com/uGIj5IYBdhLWcg3wk142#20114601 but it is for binary classifiers and could not find a good resource which explains the output of decision_function for multiclass classifiers with shape ovo (one-vs-one).

Edit:

The above example is for class 'A'. For another input the classifier predicted 'C' and gave the following result in decision_function
[[ 96.42193513 -11.13296606 111.47424538 -88.5356536 44.29272494 141.0069203 ]]
For another different input which the classifier predicted as 'C' gave the following result from decision_function,
[[ 290.54180354 -133.93467605  116.37068951 -392.32251314 -130.84421412   284.87653043]]
Had it been ovr (one-vs-rest), it would become easier by selecting the one with higher value, but in ovo (one-vs-one) there are (n * (n - 1)) / 2 values in the resulting list.

How to deduce which class would be selected based on the decision function?

答案

你的链接拥有足够的资源，所以让我们来看一下：

当你调用decision_function()函数时，你会得到每个成对分类器（n*(n-1)/2个数字）的输出。详见《支持向量机模式识别》第127页和128页。

点击“page 127 and 128”链接（这里没有显示，但在Stackoverflow答案中）。你应该会看到：

Python的SVM实现使用one-vs-one。这正是书中所讲的内容。
对于每个成对比较，我们测量决策函数。
决策函数就是常规的二元SVM决策边界。

这与你的问题有什么关系呢？

clf.decision_function()将为您提供每个成对比较的$D$。
获得最多票数的类别胜出。

例如,

[[ 96.42193513 -11.13296606 111.47424538 -88.5356536 44.29272494 141.0069203 ]]

正在进行比较:

[AB, AC, AD, BC, BD, CD]

我们通过符号对它们进行标记。我们得到：

[A, C, A, C, B, C]

例如，96.42193513是正数，因此A是AB的标签。

现在我们有三个C，C将成为您的预测。如果你按照我的步骤重复另外两个例子，你会得到Python的预测结果。试试吧！

- serv-inc

1

截距值（b）应该加还是减去点积？我在Wikipedia上查看，它是减去的，但在文章中是加上的。这真的很重要吗？我非常担心，因为我将决策函数计算为w.x + b而不是w.x - b。 - fabda01

虽然你可以在原始代码中询问这个问题，但直觉上，使用+b而不是-b应该会导致一个倒置的b。这实际上不应该是个问题。 - serv-inc

20

当您调用decision_function()时，您将得到每个成对分类器的输出（总共n*(n-1)/2个数字）。请参见“支持向量机模式分类”的第127页和第128页。

每个分类器都会投票以确定正确答案（基于该分类器的输出符号）;predict()返回获得最多票数的类。

- RomanCobra

2

谢谢Roman！我测试了一下，大部分情况下看起来predict是选择得票最多的类别。我最初做错的是选择具有最佳累积边际得分的类别。 - Peter Tseng

3

它们之间可能存在一些复杂的数学关系。但是，如果您在LinearSVC分类器中使用decision_function，那么这两者之间的关系将更加清晰！因为此时decision_function将为每个类标签（与SVC不同）提供得分，并且predict将给出得分最高的类别。

- Bilal Dadanlar

1

Predict() 采用成对投票方案，返回所有成对比较中得票最高的类别。当两个类别得分相同时，将返回索引最低的类别。

下面是一个 Python 示例，它将此投票方案应用于由 one-versus-one decision_function() 返回的 (n*(n-1)/2) 成对得分。

from sklearn import svm
from sklearn import datasets
from numpy import argmax, zeros
from itertools import combinations

# do pairwise comparisons, return class with most +1 votes
def ovo_vote(classes, decision_function):
    combos = list(combinations(classes, 2))
    votes = zeros(len(classes))
    for i in range(len(decision_function[0])):
        if decision_function[0][i] > 0:
            votes[combos[i][0]] = votes[combos[i][0]] + 1
        else:
            votes[combos[i][1]] = votes[combos[i][1]] + 1
    winner = argmax(votes)
    return classes[winner]

# load the digits data set
digits = datasets.load_digits()

X, y = digits.data, digits.target

# set the SVC's decision function shape to "ovo"
estimator = svm.SVC(gamma=0.001, C=100., decision_function_shape='ovo')

# train SVC on all but the last digit
estimator.fit(X.data[:-1], y[:-1])

# print the value of the last digit
print("To be classified digit: ", y[-1:][0])

# print the predicted class
pred = estimator.predict(X[-1:])
print("Perform classification using predict: ", pred[0])

# get decision function
df = estimator.decision_function(X[-1:])

# print the decision function itself
print("Decision function consists of",len(df[0]),"elements:")
print(df)

# get classes, here, numbers 0 to 9
digits = estimator.classes_

# print which class has most votes
vote = ovo_vote(digits, df)
print("Perform classification using decision function: ", vote)

- Robin van Emden

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Martin Böschen · Accepted Answer

我不完全理解你的代码，但是让我们一起来看一下你提到的文档页面中的示例。

import numpy as np
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
y = np.array([1, 1, 2, 2])
from sklearn.svm import SVC
clf = SVC()
clf.fit(X, y)

现在让我们将decision_function()和predict()都应用到样本中：

clf.decision_function(X)
clf.predict(X)

我们得到的输出是：

array([[-1.00052254],
       [-1.00006594],
       [ 1.00029424],
       [ 1.00029424]])
array([1, 1, 2, 2])

这很容易理解：决策函数告诉我们在分类器生成的超平面的哪一侧（以及离它有多远）。根据这些信息，估计器会给示例标上相应的标签。