使用LibSVM进行一对多支持向量机中的十折交叉验证

Question

使用LibSVM进行一对多支持向量机中的十折交叉验证

matlabmachine-learningclassificationsvmlibsvm

11

我想在MATLAB中进行支持向量机分类中的one-against-all十折交叉验证。

我试图将这两个相关答案结合起来：

但由于我是MATLAB和其语法的新手，所以到目前为止我还没有成功。

另一方面，我只看到LibSVM README文件中关于交叉验证的以下几行内容，但我找不到任何相关的示例：

选项-v将数据随机分成n个部分，并计算它们的交叉验证准确性/均方误差。

有关输出含义，请参见libsvm FAQ。

有人能提供一个10倍交叉验证和one-against-all分类的示例吗？

- Zahra E

正如carlosdc所指出的那样，第二个链接展示了生物信息学工具箱中SVM函数的使用（而不是libsvm）。 - Amro

2

FYI，从R2013a开始，MATLAB的svm函数已经从生物信息学工具箱移动到统计工具箱中（我认为它们本来就应该在那里！） - Amro

2个回答

3

可能会让你感到困惑的是，其中一个问题并不是关于LIBSVM的。你应该尝试调整这个答案，忽略另一个问题。

你应该选择折叠，并且按照链接问题中的步骤进行操作。假设数据已经加载到data中，标签已经加载到labels中：

n = size(data,1);
ns = floor(n/10);
for fold=1:10,
    if fold==1,
        testindices= ((fold-1)*ns+1):fold*ns;
        trainindices = fold*ns+1:n;
    else
        if fold==10,
            testindices= ((fold-1)*ns+1):n;
            trainindices = 1:(fold-1)*ns;
        else
            testindices= ((fold-1)*ns+1):fold*ns;
            trainindices = [1:(fold-1)*ns,fold*ns+1:n];
         end
    end
    % use testindices only for testing and train indices only for testing
    trainLabel = label(trainindices);
    trainData = data(trainindices,:);
    testLabel = label(testindices);
    testData = data(testindices,:)
    %# train one-against-all models
    model = cell(numLabels,1);
    for k=1:numLabels
        model{k} = svmtrain(double(trainLabel==k), trainData, '-c 1 -g 0.2 -b 1');
    end

    %# get probability estimates of test instances using each model
    prob = zeros(size(testData,1),numLabels);
    for k=1:numLabels
        [~,~,p] = svmpredict(double(testLabel==k), testData, model{k}, '-b 1');
        prob(:,k) = p(:,model{k}.Label==1);    %# probability of class==k
    end

    %# predict the class with the highest probability
    [~,pred] = max(prob,[],2);
    acc = sum(pred == testLabel) ./ numel(testLabel)    %# accuracy
    C = confusionmat(testLabel, pred)                   %# confusion matrix
end

- carlosdc

在第 prob = zeros(numTest,numLabels); 行，你用 numTest 指的是 ns，对吧？ - Zahra E

不，我的意思是您正在测试的数据点数量。我已经编辑了代码。 - carlosdc

那么 -v 选项呢？我们不需要使用它吗？ - Zahra E

但是这里说-v用于交叉验证，而不是一对一或一对所有。我没错吧？ - Zahra E

@carlosdc - 谢谢你的努力 :) - Zahra E

显示剩余3条评论

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Amro · Accepted Answer

主要有两个原因我们进行交叉验证：

作为一种测试方法，可以给我们一个几乎无偏的模型泛化能力估计（避免过度拟合）
作为模型选择的一种方式（例如：在训练数据上找到最佳的C和gamma参数，请参见这篇文章的示例）

对于我们感兴趣的第一种情况，该过程涉及为每个折叠训练k个模型，然后在整个训练集上训练一个最终模型。我们报告k倍交叉验证的平均准确率。

现在，由于我们使用一对多的方法来处理多类问题，因此每个模型包含N个支持向量机（每个类别一个）。

以下是实现一对多方法的包装函数：

function mdl = libsvmtrain_ova(y, X, opts)
    if nargin < 3, opts = ''; end

    %# classes
    labels = unique(y);
    numLabels = numel(labels);

    %# train one-against-all models
    models = cell(numLabels,1);
    for k=1:numLabels
        models{k} = libsvmtrain(double(y==labels(k)), X, strcat(opts,' -b 1 -q'));
    end
    mdl = struct('models',{models}, 'labels',labels);
end

function [pred,acc,prob] = libsvmpredict_ova(y, X, mdl)
    %# classes
    labels = mdl.labels;
    numLabels = numel(labels);

    %# get probability estimates of test instances using each 1-vs-all model
    prob = zeros(size(X,1), numLabels);
    for k=1:numLabels
        [~,~,p] = libsvmpredict(double(y==labels(k)), X, mdl.models{k}, '-b 1 -q');
        prob(:,k) = p(:, mdl.models{k}.Label==1);
    end

    %# predict the class with the highest probability
    [~,pred] = max(prob, [], 2);
    %# compute classification accuracy
    acc = mean(pred == y);
end

以下是支持交叉验证的函数：

function acc = libsvmcrossval_ova(y, X, opts, nfold, indices)
    if nargin < 3, opts = ''; end
    if nargin < 4, nfold = 10; end
    if nargin < 5, indices = crossvalidation(y, nfold); end

    %# N-fold cross-validation testing
    acc = zeros(nfold,1);
    for i=1:nfold
        testIdx = (indices == i); trainIdx = ~testIdx;
        mdl = libsvmtrain_ova(y(trainIdx), X(trainIdx,:), opts);
        [~,acc(i)] = libsvmpredict_ova(y(testIdx), X(testIdx,:), mdl);
    end
    acc = mean(acc);    %# average accuracy
end

function indices = crossvalidation(y, nfold)
    %# stratified n-fold cros-validation
    %#indices = crossvalind('Kfold', y, nfold);  %# Bioinformatics toolbox
    cv = cvpartition(y, 'kfold',nfold);          %# Statistics toolbox
    indices = zeros(size(y));
    for i=1:nfold
        indices(cv.test(i)) = i;
    end
end

最后，这里有一个简单的演示来说明用法：

%# laod dataset
S = load('fisheriris');
data = zscore(S.meas);
labels = grp2idx(S.species);

%# cross-validate using one-vs-all approach
opts = '-s 0 -t 2 -c 1 -g 0.25';    %# libsvm training options
nfold = 10;
acc = libsvmcrossval_ova(labels, data, opts, nfold);
fprintf('Cross Validation Accuracy = %.4f%%\n', 100*mean(acc));

%# compute final model over the entire dataset
mdl = libsvmtrain_ova(labels, data, opts);

与默认使用的libsvm的一对一方法相比较：

acc = libsvmtrain(labels, data, sprintf('%s -v %d -q',opts,nfold));
model = libsvmtrain(labels, data, strcat(opts,' -q'));