使用NCE或采样softmax训练TensorFlow语言模型

Question

使用NCE或采样softmax训练TensorFlow语言模型

5

我正在将TensorFlow RNN教程改编为使用NCE损失或采样softmax来训练语言模型，但我仍然想报告困惑度。然而，我得到的困惑度非常奇怪：对于NCE，我得到了几百万（糟糕！），而对于采样softmax，我在一个时代后得到了700的PPL（太好了，难以置信？！）。我想知道我做错了什么。

以下是我对PTBModel的改编：

class PTBModel(object):
  """The PTB model."""

  def __init__(self, is_training, config, loss_function="softmax"):
    ...
    w = tf.get_variable("proj_w", [size, vocab_size])
    w_t = tf.transpose(w)
    b = tf.get_variable("proj_b", [vocab_size])

    if loss_function == "softmax":
      logits = tf.matmul(output, w) + b
      loss = tf.nn.seq2seq.sequence_loss_by_example(
          [logits],
          [tf.reshape(self._targets, [-1])],
          [tf.ones([batch_size * num_steps])])
      self._cost = cost = tf.reduce_sum(loss) / batch_size
    elif loss_function == "nce":
      num_samples = 10
      labels = tf.reshape(self._targets, [-1,1])
      hidden = output
      loss = tf.nn.nce_loss(w_t, b,                           
                            hidden,
                            labels,
                            num_samples, 
                            vocab_size)
    elif loss_function == "sampled_softmax":
      num_samples = 10
      labels = tf.reshape(self._targets, [-1,1])
      hidden = output
      loss = tf.nn.sampled_softmax_loss(w_t, b,
                                        hidden, 
                                        labels, 
                                        num_samples,
                                        vocab_size)

    self._cost = cost = tf.reduce_sum(loss) / batch_size
    self._final_state = state

这个模型的调用方式如下所示:

mtrain = PTBModel(is_training=True, config=config, loss_function="nce")
mvalid = PTBModel(is_training=True, config=config)

我这里没有做任何奇怪的事情，更改损失函数应该非常简单。那么为什么它不起作用呢？

谢谢， Joris

- niefpaarschoenen

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Oriol Vinyals · Answer 1

0

使用基准模型（Softmax），一个 epoch 中您应该能够得到比 700 更好的结果。通过更改损失函数，您可能需要重新调整一些超参数 —— 特别是学习率。

此外，您的评估模型应使用 Softmax 报告真正的困惑度 —— 您是否这样做了？

- Oriol Vinyals

似乎采样的softmax确实有效，在13个epochs（SmallConfig）后，它以20个负样本结尾于129。 - niefpaarschoenen

1

NCE 相反仍在使我失望。如你所说，使用全 softmax 计算的困惑度数值达到了数百万。虽然同意需要重新调整，但即使没有调整，我也希望困惑度会略微下降而不是从约 10k 增加到 2M？！ - niefpaarschoenen

FYI：NCE在低数量的时间步长下实际上会给出合理的值。当你增加这个数字时，它开始变得疯狂。 - niefpaarschoenen

@niefpaarschoenen 你好，我正在处理这个问题。你使用NCE后有发现性能提升吗？特别是每秒处理的单词数量方面？谢谢。 - pltrdy