多变量梯度下降

5

我正在学习用 梯度下降法 计算系数。以下是我的步骤:

#!/usr/bin/Python

 import numpy as np


   # m denotes the number of examples here, not the number of features
 def gradientDescent(x, y, theta, alpha, m, numIterations):
     xTrans = x.transpose()
     for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        # avg cost per example (the 2 in 2*m doesn't really matter here.
        # But to be consistent with the gradient, I include it)
        cost = np.sum(loss ** 2) / (2 * m)
        #print("Iteration %d | Cost: %f" % (i, cost))
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m
        # update
        theta = theta - alpha * gradient
     return theta

 X = np.array([41.9,43.4,43.9,44.5,47.3,47.5,47.9,50.2,52.8,53.2,56.7,57.0,63.5,65.3,71.1,77.0,77.8])
 y = np.array([251.3,251.3,248.3,267.5,273.0,276.5,270.3,274.9,285.0,290.0,297.0,302.5,304.5,309.3,321.7,330.7,349.0])
 n = np.max(X.shape)
 x = np.vstack([np.ones(n), X]).T      
 m, n = np.shape(x)
 numIterations= 100000
 alpha = 0.0005
 theta = np.ones(n)
 theta = gradientDescent(x, y, theta, alpha, m, numIterations)
 print(theta)

现在我的上面的代码正常运行。如果我现在尝试多个变量并替换XX1,如下所示:
  X1 = np.array([[41.9,43.4,43.9,44.5,47.3,47.5,47.9,50.2,52.8,53.2,56.7,57.0,63.5,65.3,71.1,77.0,77.8], [29.1,29.3,29.5,29.7,29.9,30.3,30.5,30.7,30.8,30.9,31.5,31.7,31.9,32.0,32.1,32.5,32.9]])

然后我的代码失败了,并显示了以下错误:
  JustTestingSGD.py:14: RuntimeWarning: overflow encountered in square
  cost = np.sum(loss ** 2) / (2 * m)
  JustTestingSGD.py:19: RuntimeWarning: invalid value encountered in subtract
  theta = theta - alpha * gradient
  [ nan  nan  nan]

有人能告诉我如何使用X1进行梯度下降吗?我预期使用X1的输出结果为:

[-153.5 1.24 12.08]

我也愿意尝试其他的Python实现。我只需要X1y系数(也称为thetas)

1个回答

3
问题出在算法无法收敛,反而发散了。第一个错误:
JustTestingSGD.py:14: RuntimeWarning: overflow encountered in square
cost = np.sum(loss ** 2) / (2 * m)

这个问题源于某些情况下计算某个数的平方是不可能的,因为64位浮点数无法存储该数字(即它大于10^309)。

JustTestingSGD.py:19: RuntimeWarning: invalid value encountered in subtract
theta = theta - alpha * gradient

这只是之前错误的后果,数字不适合计算。

如果取消调试打印行,您实际上可以看到分歧。由于没有收敛,成本开始增加。

如果您尝试使用X1和更小的alpha值运行函数,则会收敛。


如果我使用alpha = 0.0001计算X1,那么它会收敛并得到以下结果:[0.92429681 1.80242842 6.07549978],但我期望得到类似于[-153.5 1.24 12.08]的结果。我该如何获得所需的结果? - user227666

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接