TensorFlow或Theano：它们如何基于神经网络图知道损失函数的导数？

Question

TensorFlow或Theano：它们如何基于神经网络图知道损失函数的导数？

10

在TensorFlow或Theano中，你只需告诉库你的神经网络是如何构造的，以及前馈应该如何操作。

例如，在TensorFlow中，你会写下以下代码：

with graph.as_default():
    _X = tf.constant(X)
    _y = tf.constant(y)

    hidden = 20
    w0 = tf.Variable(tf.truncated_normal([X.shape[1], hidden]))
    b0 = tf.Variable(tf.truncated_normal([hidden]))

    h = tf.nn.softmax(tf.matmul(_X, w0) + b0)

    w1 = tf.Variable(tf.truncated_normal([hidden, 1]))
    b1 = tf.Variable(tf.truncated_normal([1]))

    yp = tf.nn.softmax(tf.matmul(h, w1) + b1)

    loss = tf.reduce_mean(0.5*tf.square(yp - _y))
    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

我正在使用L2范数损失函数，C = 0.5 * sum((y-yp)^2)，在反向传播步骤中需要计算导数，dC = sum(y-yp)。请参见本书第30页。

我的问题是：TensorFlow（或Theano）如何知道反向传播的解析导数？还是它们进行逼近？还是以某种方式不使用导数？

我已经完成了TensorFlow的深度学习Udacity课程，但我仍然不知道这些库的工作原理。

- Ricardo Magalhães Cruz

这里有一个相关的帖子 https://dev59.com/glcP5IYBdhLWcg3wkKpD - Anton Codes

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- mrry · Accepted Answer

区分是在最后一行发生的：

    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

当你执行minimize()方法时，TensorFlow识别出loss依赖的变量集，并为每个变量计算梯度。这种微分是在ops/gradients.py中实现的，并使用"reverse accumulation"。从loss张量向后搜索，应用数据流图中每个运算符的链式规则。 TensorFlow包括大多数（可微分）运算符的“梯度函数”，并且您可以在ops/math_grad.py中看到这些函数的实现示例。梯度函数可以使用原始操作（包括其输入、输出和属性）以及为其每个输出计算的梯度来生成其每个输入的梯度。 Ilya Sutskever博士论文第7页有一个通俗易懂的解释，阐述了这个过程的一般情况。