使用梯度下降拟合一条直线

3

我正在尝试使用梯度下降将一条线拟合到几个点上。我对此不是很精通,但我试图用python将其数学算法写下来。它运行了几次迭代,但我的预测在某个点似乎爆炸了。以下是代码:

import numpy as np
import matplotlib.pyplot as plt

def mean_squared_error(n, A, b, m, c):
    e = 0
    for i in range(n):
        e += (b[i] - (m*A[i] + c)) ** 2   
    return e/n

def der_wrt_m(n,A,b,m,c):
    d = 0
    for i in range(n):
        d += (2 * (b[i] - (m*A[i] + c)) * (-A[i]))
    return d/n

def der_wrt_c(n,A,b,m,c):
    d = 0
    for i in range(n):
        d += (2 * (b[i] - (m*A[i] + c)))
    return d/n

def update(n,A,b,m,c,descent_rate):
    return descent_rate * der_wrt_m(n,A,b,m,c)), descent_rate * der_wrt_c(n,A,b,m,c))

A = np.array(((0,1),
             (1,1),
             (2,1),
             (3,1)))
x = A.T[0]
b = np.array((1,2,0,3), ndmin=2 ).T
y = b.reshape(4)

def descent(x,y):
    m = 0
    c = 0

    descent_rate = 0.00001
    iterations = 100

    n = len(x)
    plt.scatter(x, y)
    u = np.linspace(0,3,100)
    prediction = 0
    for itr in range(iterations):
        print(m,c)
        prediction = prediction + m * x + c
        m,c = update(n,x,y,m,c,descent_rate)

    plt.plot(u, u * m + c, '-')   


descent(x,y)

这是我的输出:
0 0
19.25 -10.5
-71335.1953125 24625.9453125
5593771382944640.0 -2166081169939480.2
-2.542705027685638e+48 9.692684648057364e+47
2.40856742196228e+146 -9.202614421953049e+145
-inf inf
nan nan
nan nan
nan nan
nan nan
nan nan
nan nan
etc...

更新:值不再爆炸,但收敛情况仍不理想。
# We could also solve it using gradient descent
import numpy as np
import matplotlib.pyplot as plt

def mean_squared_error(n, A, b, m, c):
    e = 0
    for i in range(n):
        e += ((b[i] - (m * A[i] + c)) ** 2)   
    #print("mse:",e/n)
    return e/n

def der_wrt_m(n,A,b,m,c):
    d = 0
    for i in range(n):
        # d += (2 * (b[i] - (m*A[i] + c)) * (-A[i]))
        d += (A[i] * (b[i] - (m*A[i] + c)))
    #print("Dm",-2 * d/n)
    return (-2 * d/n)

def der_wrt_c(n,A,b,m,c):
    d = 0
    for i in range(n):
        d += (2 * (b[i] - (m*A[i] + c)))
    #print("Dc",d/n)
    return d/n

def update(n,A,b,m,c, descent_rate):
    return (m - descent_rate * der_wrt_m(n,A,b,m,c)),(c - descent_rate * der_wrt_c(n,A,b,m,c))

A = np.array(((0,1),
             (1,1),
             (2,1),
             (3,1)))
x = A.T[0]
b = np.array((1,2,0,3), ndmin=2 ).T
y = b.reshape(4)

def descent(x,y):
    m = 0
    c = 0

    descent_rate = 0.0001
    iterations = 10000

    n = len(x)
    plt.scatter(x, y)
    u = np.linspace(0,3,100)
    prediction = 0
    for itr in range(iterations):
        prediction = prediction + m * x + c
        m,c = update(n,x,y,m,c,descent_rate)
        loss = mean_squared_error(n, A, b, m, c)

    print(loss)
    print(m,c)
    plt.plot(u, u * m + c, '-')    

descent(x,y)

经过约10000次迭代,使用学习率为0.0001后,现在的图表如下所示:

[4.10833186 5.21468937]
1.503547594304175 -1.9947003678083184

梯度下降

然而,最小二乘拟合显示出以下形状:

输入图像描述

1个回答

3
在您的更新函数中,应该从当前的m和c中减去计算出的梯度。
def update(n,A,b,m,c,descent_rate):
    return m - (descent_rate * der_wrt_m(n,A,b,m,c)), c - (descent_rate * der_wrt_c(n,A,b,m,c))

更新:这里是可工作的版本。在得到x、y后,我取消了矩阵A,因为它让我感到困惑 =)。例如,在您的梯度计算中,您有一个表达式 d + =(A [i] *(b [i] -(m * A [i] + c))),但它应该是 d + =(x [i] *(b [i] -(m * x [i] + c))),因为x [i]给您单个元素而A [i]给出一系列元素。

另外,在计算与c有关的导数时,您忘记了减号。如果您的表达式是(y-(m * x + c))^ 2 ,则对于c的导数应为 2 *(- 1)*(y-(m * x + c)),因为c前面有一个负号。
# We could also solve it using gradient descent
import numpy as np
import matplotlib.pyplot as plt

def mean_squared_error(n, x, y, m, c):
    e = 0
    for i in range(n):
        e += (m*x[i]+c - y[i])**2
    e = e/n
    return e/n

def der_wrt_m(n, x, y, m, c):
    d = 0
    for i in range(n):
        d += x[i] * (y[i] - (m*x[i] + c))
    d = -2 * d/n
    return d

def der_wrt_c(n, x, y, m, c):
    d = 0
    for i in range(n):
        d += (y[i] - (m*x[i] + c))
    d = -2 * d/n
    return d


def update(n,x,y,m,c, descent_rate):
    return (m - descent_rate * der_wrt_m(n,x,y,m,c)),(c - descent_rate * der_wrt_c(n,x,y,m,c))


A = np.array(((0,1),
             (1,1),
             (2,1),
             (3,1)))
x = A.T[0]
b = np.array((1,2,0,3), ndmin=2 ).T
y = b.reshape(4)

print(x)
print(y)

def descent(x,y):
    m = 0.0
    c = 0.0

    descent_rate = 0.01
    iterations = 10000

    n = len(x)
    plt.scatter(x, y)
    u = np.linspace(0,3,100)
    prediction = 0
    for itr in range(iterations):
        prediction = prediction + m * x + c
        m,c = update(n,x,y,m,c,descent_rate)
        loss = mean_squared_error(n, x, y, m, c)
        print(loss)

    print(loss)
    print(m,c)
    plt.plot(u, u * m + c, '-')    
    plt.show()

descent(x,y)

1
@AhmadMoussa,你的主要问题在于对c求导时忘记了一个负号,另外使用A[i]而不是x[i]是不正确的。 - unlut
1
谢谢,问题已经解决了!现在它很好地适应了这行。感谢您的帮助。如果可以的话,我该如何将其扩展为多项式回归?另外,我刚刚注意到损失在某个点上收敛了,这正常吗?还是有什么方法可以进一步减少它? - Ahmad Moussa
1
这种方法基本上与线性回归相同,考虑你的方程y = m * x + c,将一些符号改为y = a * x + b,实际上执行了一次一次方回归,有2个要优化的参数。对于n次多项式回归,您有y = a * x ^ n + b * x ^(n-1)... + CONSTANT,因此您有n + 1个参数要优化。按照您进行线性回归时的相同过程进行操作:计算每个参数的梯度并以相同方式更新。但与线性回归不同,您的解决方案可能取决于您的初始选择(不确定)。 - unlut
1
所以,如果我想拟合一个n次多项式,那么我就必须计算n个偏导数吗?而初始选择是指起始点吗?因为我可能会陷入局部最小值?无论如何,非常感谢! - Ahmad Moussa
1
关于收敛性,在线性回归中,你会在某个时刻找到最优解或接近最优解(更高次数的多项式回归可能会卡在局部最小值处)。实际上,你应该(我只是为你的回归问题提供建议,而不是一般的梯度下降问题)做的是不使用固定步数,而是继续进行梯度下降,直到更新量非常小,如1e-5、1e-6。 - unlut
显示剩余4条评论

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接