为什么深度神经网络不能近似简单的ln(x)函数?

4

我创建了一个有两个RELU隐藏层 + 线性激活层的ANN,并尝试近似简单的ln(x)函数。但我做得不好。我感到困惑,因为在x:[0.0-1.0]范围内的ln(x)应该没有问题地进行逼近(我使用学习率0.01和基本梯度下降优化)。

import tensorflow as tf
import numpy as np

def GetTargetResult(x):
    curY = np.log(x)
    return curY

# Create model
def multilayer_perceptron(x, weights, biases):
    # Hidden layer with RELU activation
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer_1 = tf.nn.relu(layer_1)
    # # Hidden layer with RELU activation
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    layer_2 = tf.nn.relu(layer_2)

    # Output layer with linear activation
    out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
    return out_layer

# Parameters
learning_rate = 0.01
training_epochs = 10000
batch_size = 50
display_step = 500

# Network Parameters
n_hidden_1 = 50 # 1st layer number of features
n_hidden_2 = 10 # 2nd layer number of features
n_input =  1


# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_uniform([n_input, n_hidden_1])),
    'h2': tf.Variable(tf.random_uniform([n_hidden_1, n_hidden_2])),
    'out': tf.Variable(tf.random_uniform([n_hidden_2, 1]))
}
biases = {
    'b1': tf.Variable(tf.random_uniform([n_hidden_1])),
    'b2': tf.Variable(tf.random_uniform([n_hidden_2])),
    'out': tf.Variable(tf.random_uniform([1]))
}

x_data = tf.placeholder(tf.float32, [None, 1])
y_data = tf.placeholder(tf.float32, [None, 1])

# Construct model
pred = multilayer_perceptron(x_data, weights, biases)

# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(pred - y_data))
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train = optimizer.minimize(loss)

# Before starting, initialize the variables.  We will 'run' this first.
init = tf.initialize_all_variables ()

# Launch the graph.
sess = tf.Session()
sess.run(init)

for step in range(training_epochs):
    x_in = np.random.rand(batch_size, 1).astype(np.float32)
    y_in = GetTargetResult(x_in)
    sess.run(train, feed_dict = {x_data: x_in, y_data: y_in})
    if(step % display_step == 0):
        curX = np.random.rand(1, 1).astype(np.float32)
        curY =  GetTargetResult(curX)

        curPrediction = sess.run(pred, feed_dict={x_data: curX})
        curLoss = sess.run(loss, feed_dict={x_data: curX, y_data: curY})
        print("For x = {0} and target y = {1} prediction was y = {2} and squared loss was = {3}".format(curX, curY,curPrediction, curLoss))

对于上述配置,NN只是在学习猜测y=-1.00。我尝试了不同的学习率、优化器和不同的配置,但都没有成功——在任何情况下学习都无法收敛。我以前在其他深度学习框架中使用类似对数函数的方式,没有问题。这是TF特定的问题吗?我做错了什么?


ln(0.0) 不是等于负无穷吗? - Aaron
3个回答

7

你的网络需要预测什么

enter image description here

来源:WolframAlpha

你的架构是什么

ReLU(ReLU(x * W_1 + b_1) * W_2 + b_2)*W_out + b_out

思路

我的第一个想法是ReLU可能有问题。然而,输出没有应用relu,所以不应该导致问题。

将初始化(从均匀分布改为正态分布)和优化器(从SGD更改为ADAM)似乎可以解决问题:

#!/usr/bin/env python
import tensorflow as tf
import numpy as np


def get_target_result(x):
    return np.log(x)


def multilayer_perceptron(x, weights, biases):
    """Create model."""
    # Hidden layer with RELU activation
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer_1 = tf.nn.relu(layer_1)
    # # Hidden layer with RELU activation
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    layer_2 = tf.nn.relu(layer_2)

    # Output layer with linear activation
    out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
    return out_layer

# Parameters
learning_rate = 0.01
training_epochs = 10**6
batch_size = 500
display_step = 500

# Network Parameters
n_hidden_1 = 50  # 1st layer number of features
n_hidden_2 = 10  # 2nd layer number of features
n_input = 1


# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.truncated_normal([n_input, n_hidden_1], stddev=0.1)),
    'h2': tf.Variable(tf.truncated_normal([n_hidden_1, n_hidden_2], stddev=0.1)),
    'out': tf.Variable(tf.truncated_normal([n_hidden_2, 1], stddev=0.1))
}

biases = {
    'b1': tf.Variable(tf.constant(0.1, shape=[n_hidden_1])),
    'b2': tf.Variable(tf.constant(0.1, shape=[n_hidden_2])),
    'out': tf.Variable(tf.constant(0.1, shape=[1]))
}

x_data = tf.placeholder(tf.float32, [None, 1])
y_data = tf.placeholder(tf.float32, [None, 1])

# Construct model
pred = multilayer_perceptron(x_data, weights, biases)

# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(pred - y_data))
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
# train = optimizer.minimize(loss)
train = tf.train.AdamOptimizer(1e-4).minimize(loss)

# Before starting, initialize the variables.  We will 'run' this first.
init = tf.initialize_all_variables()

# Launch the graph.
sess = tf.Session()
sess.run(init)

for step in range(training_epochs):
    x_in = np.random.rand(batch_size, 1).astype(np.float32)
    y_in = get_target_result(x_in)
    sess.run(train, feed_dict={x_data: x_in, y_data: y_in})
    if(step % display_step == 0):
        curX = np.random.rand(1, 1).astype(np.float32)
        curY = get_target_result(curX)

        curPrediction = sess.run(pred, feed_dict={x_data: curX})
        curLoss = sess.run(loss, feed_dict={x_data: curX, y_data: curY})
        print(("For x = {0} and target y = {1} prediction was y = {2} and "
               "squared loss was = {3}").format(curX, curY,
                                                curPrediction, curLoss))

我训练了1分钟,获得了以下结果:

For x = [[ 0.19118255]] and target y = [[-1.65452647]] prediction was y = [[-1.65021849]] and squared loss was = 1.85587377928e-05
For x = [[ 0.17362741]] and target y = [[-1.75084364]] prediction was y = [[-1.74087048]] and squared loss was = 9.94640868157e-05
For x = [[ 0.60853624]] and target y = [[-0.4966988]] prediction was y = [[-0.49964082]] and squared loss was = 8.65551464813e-06
For x = [[ 0.33864763]] and target y = [[-1.08279514]] prediction was y = [[-1.08586168]] and squared loss was = 9.4036658993e-06
For x = [[ 0.79126364]] and target y = [[-0.23412406]] prediction was y = [[-0.24541236]] and squared loss was = 0.000127425722894
For x = [[ 0.09994856]] and target y = [[-2.30309963]] prediction was y = [[-2.29796076]] and squared loss was = 2.6408026315e-05
For x = [[ 0.31053194]] and target y = [[-1.16946852]] prediction was y = [[-1.17038012]] and squared loss was = 8.31002580526e-07
For x = [[ 0.0512077]] and target y = [[-2.97186542]] prediction was y = [[-2.96796203]] and squared loss was = 1.52364455062e-05
For x = [[ 0.120253]] and target y = [[-2.11815739]] prediction was y = [[-2.12729549]] and squared loss was = 8.35050013848e-05

因此,答案可能是你的优化器不够好/优化问题从一个糟糕的起点开始。参见:
- Xavier Glorot,Yoshua Bengio:理解训练深度前馈神经网络的难度 - 可视化优化算法 以下图像来自 Alec Radfords 的漂亮 GIF。它不包含 ADAM,但你可以感受到比 SGD 好多少: enter image description here 这可能有两个改进的想法:
- 尝试使用 dropout - 尝试不使用接近 0 的 x 值。我更喜欢在 [0.01, 1] 中采样值。
然而,我的回归问题经验非常有限。

0

首先,您的输入数据范围在[0,1)之间,这不是神经网络的好输入。在计算完y后,从x中减去平均值以使其归一化(最好还要除以标准差)。

然而,在您的特定情况下,这还不足以使其正常工作。

我尝试了一下,并找到了两种方法使其正常工作(都需要像上面描述的那样进行数据归一化):

  1. 要么完全删除第二层

或者

  1. 将第二层中的神经元数量设置为50。

我的猜测是,10个神经元没有足够的表示能力将足够的信息传递到最后一层(显然,一个完美聪明的NN会学习忽略第二层,在这种情况下通过其中一个神经元传递答案,但理论上的可能性并不意味着梯度下降会学会这样做)。


-1

我还没有看代码,但这是理论。如果您使用像“tanh”这样的激活函数,则对于小权重,激活函数处于线性区域,而对于大权重,激活函数为-1或+1。如果您在所有层中都处于线性区域,则无法近似复杂函数(即您有一堆线性层,因此您能做到的最好的是线性逼近),但如果您有更大的权重,则非线性允许您近似广泛的函数。并非没有免费午餐,权重需要处于正确的值以避免过度拟合和欠拟合。这个过程称为正则化。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接