为什么我的线性回归会得到NaN值而不是学习?

3
我正在运行以下代码:
import tensorflow as tf

# data set
x_data = [10., 20., 30., 40.]
y_data = [20., 40., 60., 80.]

# try to find values for w and b that compute y_data = W * x_data + b
# range is -100 ~ 100
W = tf.Variable(tf.random_uniform([1], -1000., 1000.))
b = tf.Variable(tf.random_uniform([1], -1000., 1000.))

X = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)

# my hypothesis
hypothesis = W * X + b

# Simplified cost function
cost = tf.reduce_mean(tf.square(hypothesis - Y))

# minimize
a = tf.Variable(0.1)  # learning rate, alpha
optimizer = tf.train.GradientDescentOptimizer(a)
train = optimizer.minimize(cost)  # goal is minimize cost

# before starting, initialize the variables
init = tf.initialize_all_variables()

# launch
sess = tf.Session()
sess.run(init)

# fit the line
for step in xrange(2001):
    sess.run(train, feed_dict={X: x_data, Y: y_data})
    if step % 100 == 0:
        print step, sess.run(cost, feed_dict={X: x_data, Y: y_data}), sess.run(W), sess.run(b)

print sess.run(hypothesis, feed_dict={X: 5})
print sess.run(hypothesis, feed_dict={X: 2.5})

并且这是结果跟随。
0 1.60368e+10 [ 4612.54003906] [ 406.81304932]
100 nan [ nan] [ nan]
200 nan [ nan] [ nan]
300 nan [ nan] [ nan]
400 nan [ nan] [ nan]
500 nan [ nan] [ nan]
600 nan [ nan] [ nan]
700 nan [ nan] [ nan]
800 nan [ nan] [ nan]
900 nan [ nan] [ nan]
1000 nan [ nan] [ nan]
1100 nan [ nan] [ nan]
1200 nan [ nan] [ nan]
1300 nan [ nan] [ nan]
1400 nan [ nan] [ nan]
1500 nan [ nan] [ nan]
1600 nan [ nan] [ nan]
1700 nan [ nan] [ nan]
1800 nan [ nan] [ nan]
1900 nan [ nan] [ nan]
2000 nan [ nan] [ nan]
[ nan]
[ nan]

我不明白为什么结果是 nan

如果我把初始数据改成这样

x_data = [1., 2., 3., 4.]
y_data = [2., 4., 6., 8.]

然后它正常运行,为什么呢?
1个回答

5

您的问题是由于学习速率过高导致float32溢出,导致梯度下降每一步中变量W振荡到越来越大的值,而不是收敛。

如果您将

a = tf.Variable(0.1)

to

a = tf.Variable(0.001)

权重应该更好地收敛。您可能想要增加迭代次数(约为50000次)。
在实现或使用机器学习算法时,选择一个合适的学习率通常是第一个挑战。如果损失值上升而不是收敛到最小值,则往往表明学习率过高。
在您的情况下,拟合直线的具体问题会更容易受到训练数据中较大幅度的影响,因此在例如神经网络中的训练之前对数据进行标准化是很常见的做法之一。
此外,给定起始权重和偏置有非常大的范围,这意味着它们可能与理想的值相差很远,并且在开始时具有非常大的损失值和梯度。选择一个好的初始值范围是在研究更高级别的学习算法时需要做正确的另一件关键事情。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接