如何在Keras模型中使用TensorFlow的采样Softmax损失函数？

Question

如何在Keras模型中使用TensorFlow的采样Softmax损失函数？

tensorflowdeep-learningkerasloss-function

10

我正在使用Keras训练一个语言模型，并希望通过在网络中使用采样softmax作为最终激活函数来加速训练。从TF文档上看，似乎我需要提供weights和biases的参数，但我不确定这些参数需要什么样的输入。看起来我可以在Keras中编写自定义函数如下：

import keras.backend as K

def sampled_softmax(weights, biases, y_true, y_pred, num_sampled, num_classes):
    return K.sampled_softmax(weights, biases, y_true, y_pred, num_sampled, num_classes)

然而，我不确定如何将其“插入”到我的现有网络中。 LM的架构非常简单：

model = Sequential()
model.add(Embedding(input_dim=len(vocab), output_dim=256))
model.add(LSTM(1024, return_sequence=True))
model.add(Dense(output_dim=len(vocab), activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

根据这个架构，当我在模型上调用编译方法时，我是否可以将sampled_softmax函数作为loss参数传递？还是需要将其编写为位于最终全连接层之后的层。任何指导都将不胜感激。谢谢。

- kylerthecreator

这可能会有所帮助。 Stackoverflow问题 - Mudit Verma

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- josephkibe · Accepted Answer

关键观察结果是TensorFlow采样softmax函数返回实际损失，而不是一组可能标签的预测值，以与真实数据进行比较，然后计算损失作为一个单独步骤。这使得模型设置有点奇怪。

首先，我们向模型添加第二个输入层，将目标（训练）数据作为输入之一进行编码，除了作为目标输出外。这用于“sampled_softmax_loss”函数的“labels”参数。它需要是Keras输入，因为在实例化和设置模型时将其视为输入。

其次，我们构建一个新的自定义Keras图层，调用“sampled_softmax_loss”函数，并使用两个Keras图层作为其输入：预测我们的类的密集层的输出，以及包含训练数据副本的第二个输入。请注意，我们正在访问“ _keras_history”实例变量，从原始全连接层的输出张量中提取权重和偏差张量。

最后，我们必须构建一个新的“愚蠢”的损失函数，忽略训练数据，只使用“sampled_softmax_loss”函数报告的损失。

请注意，由于采样softmax函数返回损失而不是类预测，因此您不能将此模型规范用于验证或推理。您需要在应用标准softmax函数到具有默认激活函数的原始密集层时，重新使用这个“训练版本”的训练图层。

肯定有更优雅的方法来做到这一点，但我相信这个方法是有效的，所以我认为我会将它现在发布在这里，而不是等到我有更好的东西。例如，您可能希望将类的数量作为“SampledSoftmax”层的参数，或者更好的是，将所有内容压缩到损失函数中，就像在原始问题中一样，避免两次传入训练数据。

from keras.models import Model
from keras.layers import Input, Dense, Layer
from keras import backend as K

class SampledSoftmax(Layer):
    def __init__(self, **kwargs):
        super(SampledSoftmax, self).__init__(**kwargs)


    def call(self, inputs):
        """
        The first input should be the model as it were, and the second the
        target (i.e., a repeat of the training data) to compute the labels
        argument

        """
        # the labels input to this function is batch size by 1, where the
        # value at position (i, 1) is the index that is true (not zero)
        # e.g., (0, 0, 1) => (2) or (0, 1, 0, 0) => (1)
        return K.tf.nn.sampled_softmax_loss(weights=inputs[0]._keras_history[0].weights[0],
                                            biases=inputs[0]._keras_history[0].bias,
                                            inputs=inputs[0],
                                            labels=K.tf.reshape(K.tf.argmax(inputs[1], 1), [-1, 1]),
                                            num_sampled=1000,
                                            num_classes=200000)

def custom_loss(y_true, y_pred):
    return K.tf.reduce_mean(y_pred)


num_classes = 200000
input = Input(shape=(300,))
target_input = Input(shape=(num_classes,))

dense = Dense(num_classes)

outputs = dense(input)
outputs = SampledSoftmax()([outputs, target_input])

model = Model([input, target_input], outputs)
model.compile(optimizer=u'adam', loss=custom_loss)
# train as desired