Keras - 验证损失和准确率停留在0

26
我正在尝试在Tensorflow keras中训练一个简单的两层全连接神经网络进行二元分类。我使用sklearn的train_test_split()将数据分为训练集和验证集,比例为80:20。 当我调用model.fit(X_train, y_train, validation_data=[X_val, y_val])时,所有时期的验证损失和准确性都显示为0,但它仍然可以正常训练。

Screenshot of model.fit call and verbose log

此外,当我尝试在验证集上进行评估时,输出结果不为零。

Screenshot of model.evaluate function call

请问为什么我在验证时遇到了0损失0准确率的错误,望解答。以下是完整的代码样例(MCVE):https://colab.research.google.com/drive/1P8iCUlnD87vqtuS5YTdoePcDOVEKpBHr?usp=sharing。谢谢您的帮助。

不要像我一样。在使用分类损失和准确率而不是回归时,我遇到了回归模型的问题。 - BSalita
2个回答

39
  • 如果您使用keras而不是tf.keras,一切都可以正常工作。

  • 使用tf.keras时,我甚至尝试了validation_data = [X_train, y_train],但这也会导致准确度为零。

以下是演示:

model.fit(X_train, y_train, validation_data=[X_train.to_numpy(), y_train.to_numpy()], 
epochs=10, batch_size=64)

Epoch 1/10
8/8 [==============================] - 0s 6ms/step - loss: 0.7898 - accuracy: 0.6087 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 2/10
8/8 [==============================] - 0s 6ms/step - loss: 0.6710 - accuracy: 0.6500 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 3/10
8/8 [==============================] - 0s 5ms/step - loss: 0.6748 - accuracy: 0.6500 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 4/10
8/8 [==============================] - 0s 6ms/step - loss: 0.6716 - accuracy: 0.6370 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 5/10
8/8 [==============================] - 0s 6ms/step - loss: 0.6085 - accuracy: 0.6326 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 6/10
8/8 [==============================] - 0s 6ms/step - loss: 0.6744 - accuracy: 0.6326 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 7/10
8/8 [==============================] - 0s 6ms/step - loss: 0.6102 - accuracy: 0.6522 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 8/10
8/8 [==============================] - 0s 6ms/step - loss: 0.7032 - accuracy: 0.6109 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 9/10
8/8 [==============================] - 0s 5ms/step - loss: 0.6283 - accuracy: 0.6717 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 10/10
8/8 [==============================] - 0s 5ms/step - loss: 0.6120 - accuracy: 0.6652 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00

所以,肯定存在一些与tensorflowfit实现相关的问题。

我查看了源代码,似乎是validation_data这部分出了问题:

...
...
        # Run validation.
        if validation_data and self._should_eval(epoch, validation_freq):
          val_x, val_y, val_sample_weight = (
              data_adapter.unpack_x_y_sample_weight(validation_data))
          val_logs = self.evaluate(
              x=val_x,
              y=val_y,
              sample_weight=val_sample_weight,
              batch_size=validation_batch_size or batch_size,
              steps=validation_steps,
              callbacks=callbacks,
              max_queue_size=max_queue_size,
              workers=workers,
              use_multiprocessing=use_multiprocessing,
              return_dict=True)
          val_logs = {'val_' + name: val for name, val in val_logs.items()}
          epoch_logs.update(val_logs)

在已经确定evaluate正常工作的情况下,内部调用model.evaluate,我意识到唯一的罪魁祸首可能是unpack_x_y_sample_weight

所以,我查看了其实现:

def unpack_x_y_sample_weight(data):
  """Unpacks user-provided data tuple."""
  if not isinstance(data, tuple):
    return (data, None, None)
  elif len(data) == 1:
    return (data[0], None, None)
  elif len(data) == 2:
    return (data[0], data[1], None)
  elif len(data) == 3:
    return (data[0], data[1], data[2])

  raise ValueError("Data not understood.")

太不可思议了,但如果你传递的是一个元组而不是列表,由于unpack_x_y_sample_weight内部的检查,一切都能正常工作。(在这一步之后,您的标签丢失了,但某种方式数据在evaluate内进行了修复,因此您正在使用没有合理标签的训练,这似乎是一个错误,但文档清楚地说明要传递元组)

以下代码提供了正确的验证准确性和损失:

model.fit(X_train, y_train, validation_data=(X_train.to_numpy(), y_train.to_numpy()), 
epochs=10, batch_size=64)

Epoch 1/10
8/8 [==============================] - 0s 7ms/step - loss: 0.5832 - accuracy: 0.6696 - val_loss: 0.6892 - val_accuracy: 0.6674
Epoch 2/10
8/8 [==============================] - 0s 7ms/step - loss: 0.6385 - accuracy: 0.6804 - val_loss: 0.8984 - val_accuracy: 0.5565
Epoch 3/10
8/8 [==============================] - 0s 7ms/step - loss: 0.6822 - accuracy: 0.6391 - val_loss: 0.6556 - val_accuracy: 0.6739
Epoch 4/10
8/8 [==============================] - 0s 6ms/step - loss: 0.6276 - accuracy: 0.6609 - val_loss: 1.0691 - val_accuracy: 0.5630
Epoch 5/10
8/8 [==============================] - 0s 7ms/step - loss: 0.7048 - accuracy: 0.6239 - val_loss: 0.6474 - val_accuracy: 0.6326
Epoch 6/10
8/8 [==============================] - 0s 7ms/step - loss: 0.6545 - accuracy: 0.6500 - val_loss: 0.6659 - val_accuracy: 0.6043
Epoch 7/10
8/8 [==============================] - 0s 7ms/step - loss: 0.5796 - accuracy: 0.6913 - val_loss: 0.6891 - val_accuracy: 0.6435
Epoch 8/10
8/8 [==============================] - 0s 7ms/step - loss: 0.5915 - accuracy: 0.6891 - val_loss: 0.5307 - val_accuracy: 0.7152
Epoch 9/10
8/8 [==============================] - 0s 7ms/step - loss: 0.5571 - accuracy: 0.7000 - val_loss: 0.5465 - val_accuracy: 0.6957
Epoch 10/10
8/8 [==============================] - 0s 7ms/step - loss: 0.7133 - accuracy: 0.6283 - val_loss: 0.7046 - val_accuracy: 0.6413

所以,由于这似乎是一个错误,我刚在Tensorflow Github仓库中开了一个相关问题:

https://github.com/tensorflow/tensorflow/issues/39370


2
尝试将模型中的loss从loss="categorical_crossentropy"更改为loss="binary_crossentropy"。 我也遇到了相同的问题,尝试了上面的答案,但对我有用的是另一种方法。问题在于我的模型是二元分类模型,只有一个输出节点,而不是具有多个输出节点的多分类模型,因此在这种情况下,loss="binary_crossentropy"是适当的损失函数。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,