我很难理解如何在Matlab中为我的数据实现最小二乘线性分类器。我的数据有N行,每行宽度为10列。每一行代表一个具有10个特征的数据点。只有两个类别,我的测试数据的前N/2行是类1,其余是类2。
所有关于最小二乘的解释都很有道理,但我无法将它们适应到我的数据上,我只需要一点与我的数据和最小二乘方法相关的概念性解释。
所有关于最小二乘的解释都很有道理,但我无法将它们适应到我的数据上,我只需要一点与我的数据和最小二乘方法相关的概念性解释。
The idea of using least squares to create a linear classifier is to define a linear function
f(x) = w<sup>T</sup>x
and adjust w
so that f(x)
is close to 1
for your data points of one class and close to -1
for the other class. The adjustment of w
is done by minimizing for each data point the squared distance between f(x)
and either 1
or -1
, depending on its class.
% Create a two-cluster data set with 100 points in each cluster
N = 100;
X1 = 0.3*bsxfun(@plus, randn(N, 2), [6 6]);
X2 = 0.6*bsxfun(@plus, randn(N, 2), [-2 -1]);
% Create a 200 by 3 data matrix similar to the one you have
% (see note below why 200 by 3 and not 200 by 2)
X = [[X1; X2] ones(2*N, 1)];
% Create 200 by 1 vector containing 1 for each point in the first cluster
% and -1 for each point in the second cluster
b = [ones(N, 1); -ones(N, 1)]
% Solve least squares problem
z = lsqlin(X, b);
% Plot data points and linear separator found above
y = -z(3)/z(2) - (z(1)/z(2))*x;
hold on;
plot(X(:, 1), X(:, 2), 'bx'); xlim([-3 3]); ylim([-3 3]);
plot(x, y, 'r');
为了让分离器更加灵活多变,我在数据矩阵中添加了一个额外的列,这样就可以允许移动分离器。如果不这样做,你会强制分离器通过原点,这通常会导致更差的分类结果。
w ^ T x
的符号对测试点 x
进行分类,即如果 w ^ T x> 0
,则该点属于第一类,否则如果 w ^ T x <0
,则它属于第二类。 - 3lectrologos