梯度下降算法用于线性回归,但不会优化y截距参数。

3

我正在跟随 Andrew Ng 在 Coursera 上的机器学习课程,并尝试在 Python 中实现梯度下降算法。但是,我在 y 轴截距参数上遇到了麻烦,因为它似乎无法达到最佳值。以下是我的代码:

# IMPORTS
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# Acquiring Data
# Source: https://github.com/mattnedrich/GradientDescentExample
data = pd.read_csv('data.csv')

def cost_function(a, b, x_values, y_values):
    '''
    Calculates the square mean error for a given dataset
    with (x,y) pairs and the model y' = a + bx

    a: y-intercept for the model
    b: slope of the curve
    x_values, y_values: points (x,y) of the dataset
    '''
    data_len = len(x_values)
    total_error = sum([((a + b * x_values[i]) - y_values[i])**2
                       for i in range(data_len)])
    return total_error / (2 * float(data_len))


def a_gradient(a, b, x_values, y_values):
    '''
    Partial derivative of the cost_function with respect to 'a'

    a, b: values for 'a' and 'b'
    x_values, y_values: points (x,y) of the dataset
    '''
    data_len = len(x_values)
    a_gradient = sum([((a + b * x_values[i]) - y_values[i])
                      for i in range(data_len)])
    return a_gradient / float(data_len)


def b_gradient(a, b, x_values, y_values):
    '''
    Partial derivative of the cost_function with respect to 'b'

    a, b: values for 'a' and 'b'
    x_values, y_values: points (x,y) of the dataset
    '''
    data_len = len(x_values)
    b_gradient = sum([(((a + b * x_values[i]) - y_values[i]) * x_values[i])
                      for i in range(data_len)])
    return b_gradient / float(data_len)


def gradient_descent_step(a_current, b_current, x_values, y_values, alpha):
    '''
    Give a step in direction of the minimum of the cost_function using
    the 'a' and 'b' gradiants. Return new values for 'a' and 'b'.

    a_current, b_current: the current values for 'a' and 'b'
    x_values, y_values: points (x,y) of the dataset

    '''
    new_a = a_current - alpha * a_gradient(a_current, b_current, x_values, y_values)
    new_b = b_current - alpha * b_gradient(a_current, b_current, x_values, y_values)
    return (new_a, new_b)


def run_gradient_descent(a, b, x_values, y_values, alpha, precision, plot=False, verbose=False):
    '''
    Runs the gradient_descent_step function and updates (a,b) until
    the value of the cost function varies less than 'precision'.

    a, b: initial values for the point a and b in the cost_function
    x_values, y_values: points (x,y) of the dataset
    alpha: learning rate for the algorithm
    precision: value for the algorithm to stop calculation
    '''
    iterations = 0
    delta_cost = cost_function(a, b, x_values, y_values)

    error_list = [delta_cost]
    iteration_list = [0]

    # The loop runs until the delta_cost reaches the precision defined
    # When the variation in cost_function is small it means that the
    # the function is near its minimum and the parameters 'a' and 'b'
    # are a good guess for modeling the dataset.
    while delta_cost > precision:
        iterations += 1
        iteration_list.append(iterations)

        # Calculates the initial error with current a,b values
        prev_cost = cost_function(a, b, x_values, y_values)

        # Calculates new values for a and b
        a, b = gradient_descent_step(a, b, x_values, y_values, alpha)

        # Updates the value of the error
        actual_cost = cost_function(a, b, x_values, y_values)
        error_list.append(actual_cost)

        # Calculates the difference between previous and actual error values.
        delta_cost = prev_cost - actual_cost

    # Plot the error in each iteration to see how it decreases
    # and some information about our final results
    if plot:
        plt.plot(iteration_list, error_list, '-')
        plt.title('Error Minimization')
        plt.xlabel('Iteration',fontsize=12)
        plt.ylabel('Error',fontsize=12)
        plt.show()
    if verbose:
        print('Iterations = ' + str(iterations))
        print('Cost Function Value = '+ str(cost_function(a, b, x_values, y_values)))
        print('a = ' + str(a) + ' and b = ' + str(b))

    return (actual_cost, a, b)

当我使用以下参数运行算法时:
run_gradient_descent(0, 0, data['x'], data['y'], 0.0001, 0.01)

我得到了(a = 0.0496688656535,b = 1.47825808018)。

但是,在线性回归的其他资源中尝试后,最佳'a'值约为7.9。

此外,如果我改变参数'a'的初始猜测,算法会尝试调整参数'b'。

例如,如果我设置a = 200和b = 0。

run_gradient_descent(200, 0, data['x'], data['y'], 0.0001, 0.01)

我得到了(a = 199.933763331,b = -2.44824996193)

我没有发现代码有任何问题,我意识到问题在于参数a的初始猜测值。请参见上面我的回答,我定义了一个辅助函数来获取搜索一些值以获得初始a猜测的范围。


我正在处理和你一样的问题,但是...我卡在了同一个步骤上。我用我的输入尝试了你的程序:令人惊讶的是,我的程序和你的程序输出相同。除了代码相关的问题外,我们可能忘记了某些东西。如果你有联系方式,我们可以一起解决这个问题。 - IMCoins
我在SO这里没有找到发送直接消息的方法。而且我不想在这里透露我的电子邮件地址 :D 你有什么建议吗?谢谢! - floggio
2个回答

1
梯度下降并不能保证找到全局最优解。你找到全局最优解的机会取决于你的起始值。为了得到参数的真实值,首先要解决最小二乘问题,这可以保证全局最小值。
data = pd.read_csv('data.csv',header=-1)

x,y = data[0],data[1]

from scipy.stats import linregress

linregress(x,y)

这导致以下统计数据:
LinregressResult(slope=1.32243102275536, intercept=7.9910209822703848, rvalue=0.77372849988782377, pvalue=3.855655536990139e-21, stderr=0.109377979589804)

因此,b = 1.32243102275536a = 7.9910209822703848。鉴于此,使用您的代码,我使用随机起始值ab多次解决了问题。
a,b = np.random.rand()*10,np.random.rand()*10

print("Initial values of parameters: ")

print("a=%f\tb=%f" % (a,b))

run_gradient_descent(a, b,x,y,1e-4,1e-2)

这是我得到的解决方案:

Initial values of parameters: 
a=6.100305  b=2.606448

Iterations = 21
Cost Function Value = 55.2093808263
a = 6.07601889437 and b = 1.36310312751

因此,似乎您无法接近最小值的原因是由于您初始参数值的选择。如果您将最小二乘得到的 ab 放入梯度下降算法中,您也会发现它只会迭代一次并停留在原地。
不知何时出现 delta_cost > precisionTrue,它就会认为这是一个局部最优解而停止。如果您减小precision 并运行足够长的时间,则可能能找到全局最优解。

我正在考虑实现一些测试初值的东西来测试 ab。据我所记,这种情况下的成本函数只有一个全局最小值,因此不可能陷入其他局部最小值。我认为代码中有些问题,但我还没有看出来。顺便说一句,谢谢。 - floggio
我改了答案,请看上面。 - relay
你的解决方案有效吗?我在这里尝试了一下,遇到了同样的问题。我认为在求导之前添加这个常数不会改变解决方案。谢谢。 - floggio
我测试了您的代码,并将结果添加到答案中。请查看答案。 - relay
感谢您提供这么详细的解释。最近我很忙,但我已经快完成一个实现,可以在缩小的范围内随机搜索最佳解决方案。我会在接下来的几天里在这里发布它。再次感谢! - floggio
显示剩余3条评论

0

我的梯度下降实现的完整代码可以在我的Github存储库中找到: 线性回归的梯度下降

考虑到@relay所说的梯度下降算法不能保证找到全局最小值,我尝试想出一个辅助函数来限制参数a在某个搜索范围内的猜测,如下所示:

def search_range(x, y, plot=False):
    '''
    Given a dataset with points (x, y) searches for a best guess for 
    initial values of 'a'.
    '''
    data_lenght = len(x)             # Total size of of the dataset
    q_lenght = int(data_lenght / 4)  # Size of a quartile of the dataset

    # Finding the max and min value for y in the first quartile
    min_Q1 = (x[0], y[0])
    max_Q1 = (x[0], y[0])

    for i in range(q_lenght):
        temp_point = (x[i], y[i])
        if temp_point[1] < min_Q1[1]:
            min_Q1 = temp_point
        if temp_point[1] > max_Q1[1]:
            max_Q1 = temp_point

    # Finding the max and min value for y in the 4th quartile
    min_Q4 = (x[data_lenght - 1], y[data_lenght - 1])
    max_Q4 = (x[data_lenght - 1], y[data_lenght - 1])

    for i in range(data_lenght - 1, data_lenght - q_lenght, -1):
        temp_point = (x[i], y[i])
        if temp_point[1] < min_Q4[1]:
            min_Q4 = temp_point
        if temp_point[1] > max_Q4[1]:
            max_Q4 = temp_point

    mean_Q4 = (((min_Q4[0] + max_Q4[0]) / 2), ((min_Q4[1] + max_Q4[1]) / 2))

    # Finding max_y and min_y given the points found above
    # Two lines need to be defined, L1 and L2.
    # L1 will pass through min_Q1 and mean_Q4
    # L2 will pass through max_Q1 and mean_Q4

    # Calculatin slope for L1 and L2 given m = Delta(y) / Delta (x)
    slope_L1 = (min_Q1[1] - mean_Q4[1]) / (min_Q1[0] - mean_Q4[0])
    slope_L2 = (max_Q1[1] - mean_Q4[1]) / (max_Q1[0] -mean_Q4[0])

    # Calculating y-intercepts for L1 and L2 given line equation in the form y = mx + b
    # Float numbers are converted to int because they will be used as range for itaration
    y_L1 = int(min_Q1[1] - min_Q1[0] * slope_L1)
    y_L2 = int(max_Q1[1] - max_Q1[0] * slope_L2)

    # Ploting L1 and L2
    if plot:
        L1 = [(y_L1 + slope_L1 * x) for x in data['x']]
        L2 = [(y_L2 + slope_L2 * x) for x in data['x']]

        plt.plot(data['x'], data['y'], '.')
        plt.plot(data['x'], L1, '-', color='r') 
        plt.plot(data['x'], L2, '-', color='r') 
        plt.title('Scatterplot of Sample Data')
        plt.xlabel('x',fontsize=12)
        plt.ylabel('y',fontsize=12)
        plt.show()

    return y_L1, y_L2

这个想法是在search_range()函数给定的范围内使用猜测的a运行梯度下降,并获得cost_function()的最小可能值。新的梯度下降运行方式如下:

def run_search_gradient_descent(x_values, y_values, alpha, precision, verbose=False):
    '''
    Runs the gradient_descent_step function and updates (a,b) until
    the value of the cost function varies less than 'precision'.

    x_values, y_values: points (x,y) of the dataset
    alpha: learning rate for the algorithm
    precision: value for the algorithm to stop calculation
    '''    
    from math import inf

    a1, a2 = search_range(x_values, y_values)

    best_guess = [inf, 0, 0]

    for a in range(a1, a2):

        cost, linear_coef, slope = run_gradient_descent(a, 0, x_values, y_values, alpha, precision)

        # Saving value for cost_function and parameters (a,b)        
        if cost < best_guess[0]:
            best_guess = [cost, linear_coef, slope]
    if verbose:        
        print('Cost Function = ' + str(best_guess[0]))
        print('a = ' + str(best_guess[1]) + ' and b = ' + str(best_guess[2]))

    return (best_guess[0], best_guess[1], best_guess[2])

运行代码:

run_search_gradient_descent(data['x'], data['y'], 0.0001, 0.001, verbose=True)

输出结果为:

代价函数 = 55.1294483959 a = 8.02595996606,b = 1.3209768383

与scipy.stats中的线性回归进行比较,其返回值为:

a = 7.99102098227,b = 1.32243102276


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接