在Keras（Tensorflow后端）中使用二元交叉熵损失函数

Question

在Keras（Tensorflow后端）中使用二元交叉熵损失函数

8

在Keras文档中的训练示例中，使用了binary_crossentropy，并在网络的最后一层添加了sigmoid激活函数。但是，在最后一层添加sigmoid是否有必要？根据我在源代码中的发现：

def binary_crossentropy(output, target, from_logits=False):
  """Binary crossentropy between an output tensor and a target tensor.
  Arguments:
      output: A tensor.
      target: A tensor with the same shape as `output`.
      from_logits: Whether `output` is expected to be a logits tensor.
          By default, we consider that `output`
          encodes a probability distribution.
  Returns:
      A tensor.
  """
  # Note: nn.softmax_cross_entropy_with_logits
  # expects logits, Keras expects probabilities.
  if not from_logits:
    # transform back to logits
    epsilon = _to_tensor(_EPSILON, output.dtype.base_dtype)
    output = clip_ops.clip_by_value(output, epsilon, 1 - epsilon)
    output = math_ops.log(output / (1 - output))
  return nn.sigmoid_cross_entropy_with_logits(labels=target, logits=output)

Keras在Tensorflow中调用sigmoid_cross_entropy_with_logits函数，但在该函数中，又重新计算了sigmoid(logits)。因此，在最后添加sigmoid似乎没有意义，但似乎我在网上找到的所有二元/多标签分类示例和教程都在最后添加了sigmoid。此外，我不理解的是什么意思。请参考链接：https://www.tensorflow.org/versions/master/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits

# Note: nn.softmax_cross_entropy_with_logits
# expects logits, Keras expects probabilities.

为什么Keras期望概率值？它使用nn.softmax_cross_entropy_with_logits函数吗？这有意义吗？

谢谢。

- Ming

3个回答

2

在分类交叉熵中：

- 如果是预测值，则会直接计算交叉熵。 - 如果是逻辑值，则会应用带有逻辑的softmax交叉熵。

在二元交叉熵中：

- 如果是预测值，则会将其转换回逻辑值，然后应用带有逻辑的sigmoid交叉熵。 - 如果是逻辑值，则会直接应用带有逻辑的sigmoid交叉熵。

- W. Sam

2

在Keras中，我们默认在输出层使用sigmoid激活函数，然后使用keras二元交叉熵损失函数，无论是使用Theano、Tensorflow还是CNTK作为后端实现。如果你更深入地了解纯Tensorflow情况，你会发现tensorflow后端的二元交叉熵函数（你在问题中粘贴的函数）使用tf.nn.sigmoid_cross_entropy_with_logits。后者也添加了sigmoid激活函数。为避免双重sigmoid，tensorflow后端的二元交叉熵将默认（with from_logits=False）计算逆sigmoid（logit(x)=log(x/1-x)），以将输出从经过网络的原始状态恢复为未经激活的状态。可以通过在最后一层不使用sigmoid激活函数，然后使用参数from_logits=True调用tensorflow后端的二元交叉熵（或直接使用tf.nn.sigmoid_cross_entropy_with_logits）来避免额外的sigmoid激活和逆sigmoid计算。

- KrisR89

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Maxim · Accepted Answer

你说得对，这正是正在发生的事情。我认为这是由于历史原因造成的。

Keras在tensorflow之前被创建，作为theano的包装器。在theano中，人们必须手动计算sigmoid/softmax，然后应用交叉熵损失函数。Tensorflow将所有操作融合在一个op中，但已经采用了带有sigmoid/softmax层的API。

如果你想避免不必要的logit <-> probability转换，请使用from_logits=True调用binary_crossentropy损失，并且不要添加sigmoid层。