Tensorflow中的Softmax Jacobian

Question

Tensorflow中的Softmax Jacobian

3

假设我有一个简单的单层神经网络：

x = tf.placeholder(tf.float32, [batch_size, input_dim])
W = tf.Variable(tf.random_normal([input_dim, output_dim]))
a = tf.matmul(x, W)
y = tf.nn.softmax(a)

因此，变量 y 的维数为 batch_size 乘以 output_dim。我想要计算每个批次中每个样本相对于 a 的雅可比矩阵，其维度为 batch_size 乘以 output_dim 乘以 output_dim。现在，根据数学上的定义，当 i ≠ j 时，雅可比矩阵 (dy/da)_{i,j} = -y_i y_j，否则，(dy/da)_{i,i} = y_i (1 - y_i)。

我想知道如何在 TensorFlow 中计算 softmax 相对于其输入的雅可比矩阵？我知道 tf.gradients 可以计算标量相对于张量的梯度，因此我认为在 TensorFlow 中循环使用 tf.gradients 或者试图实现上述解析式的某种组合应该可以解决问题。但是我不确定如何在 TensorFlow 中使用它的 ops 来完成这个任务，如果有任何代码能帮助我完成这个任务，我将不胜感激！

- user19346

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- DomJack · Accepted Answer

看起来tf.gradients对output_dim进行了求和。解决方法：将其解开再重新组合。不确定这会对效率产生何种影响...

import numpy as np
import tensorflow as tf

batch_size = 3
input_dim = 10
output_dim = 20

W_vals = np.random.rand(input_dim, output_dim).astype(np.float32)

graph = tf.Graph()
with graph.as_default():
    x = tf.placeholder(tf.float32, [batch_size, input_dim])
    # Use a constant for easier checking
    W = tf.constant(W_vals, dtype=tf.float32)
    a = tf.matmul(x, W)
    y = a
    # remove softmax for easier checking
    # y = tf.nn.softmax(a)

    grads = tf.stack([tf.gradients(yi, x)[0] for yi in tf.unstack(y, axis=1)],
                     axis=2)

with tf.Session(graph=graph) as sess:
    x_vals = np.random.rand(batch_size, input_dim).astype(np.float32)
    g_vals = sess.run(grads, feed_dict={x: x_vals})

# check gradients match
tol = 1e-10
for i in range(batch_size):
    if np.max(np.abs(g_vals[i] - W_vals)) >= tol:
        raise Exception('')
print('Gradients seem to match!')