使用scipy.optimize最小化一个多元、可微函数

Question

使用scipy.optimize最小化一个多元、可微函数

pythonnumpyscipymathematical-optimization

8

我正在尝试使用scipy.optimize来最小化以下函数：

enter image description here

其梯度为：

enter image description here

（对于那些感兴趣的人，这是一种布拉德利-特里-卢斯模型的似然函数，与逻辑回归密切相关。）

很明显，将所有参数都加上一个常数不会改变函数的值。因此，我让 \theta_1 = 0。这是在Python中实现目标函数和梯度的方法（其中theta在这里变成了x）：

def objective(x):
    x = np.insert(x, 0, 0.0)
    tiles = np.tile(x, (len(x), 1))
    combs = tiles.T - tiles
    exps = np.dstack((zeros, combs))
    return np.sum(cijs * scipy.misc.logsumexp(exps, axis=2))

def gradient(x):
    zeros = np.zeros(cijs.shape)
    x = np.insert(x, 0, 0.0)
    tiles = np.tile(x, (len(x), 1))
    combs = tiles - tiles.T
    one = 1.0 / (np.exp(combs) + 1)
    two = 1.0 / (np.exp(combs.T) + 1)
    mat = (cijs * one) + (cijs.T * two)
    grad = np.sum(mat, axis=0)
    return grad[1:]  # Don't return the first element

这里有一个关于cijs可能长什么样子的例子：

[[ 0  5  1  4  6]
 [ 4  0  2  2  0]
 [ 6  4  0  9  3]
 [ 6  8  3  0  5]
 [10  7 11  4  0]]

这是我运行的代码，用于进行最小化：

x0 = numpy.random.random(nb_items - 1)
# Let's try one algorithm...
xopt1 = scipy.optimize.fmin_bfgs(objective, x0, fprime=gradient, disp=True)
# And another one...
xopt2 = scipy.optimize.fmin_cg(objective, x0, fprime=gradient, disp=True)

然而，在第一次迭代中它总是失败：

Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 73.290610
         Iterations: 0
         Function evaluations: 38
         Gradient evaluations: 27

我不知道为什么它失败了。由于这一行代码，错误信息被显示出来：https://github.com/scipy/scipy/blob/master/scipy/optimize/optimize.py#L853 所以这个“Wolfe线搜索”似乎没有成功，但我不知道如何继续...任何帮助都将不胜感激！

- lum

1

你的梯度函数可能不正确。尝试使用有限差分（例如使用scipy.optimize.check_grad）进行验证。 - pv.

@pv。你打赌吧 ;) 谢谢！ - lum

2个回答

1

似乎您可以将其转换为（非线性）最小二乘问题。这样，您需要为每个n变量定义间隔和每个变量的样本点数，以便构建系数矩阵。

在此示例中，我对所有变量使用相同数量的点和相同的间隔：

from scipy.optimize import leastsq
from numpy import exp, linspace, zeros, ones

n = 4
npts = 1000
xs = [linspace(0, 1, npts) for _ in range(n)]

c = ones(n**2)

a = zeros((n*npts, n**2))
def residual(c):
    a.fill(0)
    for i in range(n):
        for j in range(n):
            for k in range(npts):
                a[i+k*n, i*n+j] = 1/(exp(xs[i][k] - xs[j][k]) + 1)
                a[i+k*n, j*n+i] = 1/(exp(xs[j][k] - xs[i][k]) + 1)

    return a.dot(c)

popt, pconv = leastsq(residual, x0=c)
print(popt.reshape(n, n))
#[[ -1.24886411   1.07854552  -2.67212118   1.86334625]
# [ -7.43330057   2.0935734   37.85989442   1.37005925]
# [ -3.51761322 -37.49627917  24.90538136  -4.23103535]
# [ 11.93000731   2.52750715 -14.84822686   1.38834225]]

编辑：关于上面构建的系数矩阵的更多细节：

enter image description here

- Saullo G. P. Castro

谢谢您尝试帮助我。我更或多或少地理解您的意思，但我想避免最小二乘拟合。我的目标函数是凸函数，因此我认为我应该能够直接将其最小化，没有任何理由不这样做。 - lum

@lum 我明白你的意思... 无论如何，如果需要，这是一个非常强大的解决方案。 - Saullo G. P. Castro

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- lum · Accepted Answer

正如评论@pv.所指出的那样，我在计算梯度时犯了一个错误。首先，我的目标函数梯度的正确数学表达式是：

enter image description here

（请注意负号。）此外，我的Python实现完全错误，不仅仅是符号错误。以下是我更新后的梯度：

def gradient(x):
    nb_comparisons = cijs + cijs.T
    x = np.insert(x, 0, 0.0)
    tiles = np.tile(x, (len(x), 1))
    combs = tiles - tiles.T
    probs = 1.0 / (np.exp(combs) + 1)
    mat = (nb_comparisons * probs) - cijs
    grad = np.sum(mat, axis=1)
    return grad[1:]  # Don't return the first element.

为了调试它，我使用了：

scipy.optimize.check_grad：显示我的梯度函数生成的结果与近似（有限差分）梯度非常远。
scipy.optimize.approx_fprime：获取值应该看起来像的想法。
一些手动选择的简单示例，如果需要可以手动分析，并进行几个Wolfram Alpha查询以进行检查。