为什么在训练和测试中使用相同的数据集会得出不同的准确性？

Question

为什么在训练和测试中使用相同的数据集会得出不同的准确性？

9

我一直在研究训练集和验证集的损失函数，发现即使是同一数据集，验证损失函数也比训练损失函数小。我希望能够了解为什么会出现这种情况。

我正在使用tensorflow训练一个模型来预测某些时间序列数据。因此，模型创建和预处理如下：

window_size = 40
batch_size  = 32
forecast_period = 6
model_name = "LSTM"
tf.keras.backend.clear_session()

_seed = 42
tf.random.set_seed(_seed)

def _sub_to_batch(sub):
    return sub.batch(window_size, drop_remainder=True)

def _return_input_output(tensor):
    _input  = tensor[:, :-forecast_period, :]
    _output = tensor[:, forecast_period:, :]
    return _input, _output

def _reshape_tensor(tensor):
    tensor = tf.expand_dims(tensor, axis=-1)
    tensor = tf.transpose(tensor, [1, 0, 2])
    return tensor


# total elements after unbatch(): 3813
train_ts_dataset = tf.data.Dataset.from_tensor_slices(train_ts)\
                            .window(window_size, shift=1)\
                            .flat_map(_sub_to_batch)\
                            .map(_reshape_tensor)\
                            .map(_return_input_output)
#                             .unbatch().shuffle(buffer_size=500, seed=_seed).batch(batch_size)\
#                             .map(_return_input_output)

valid_ts_dataset = tf.data.Dataset.from_tensor_slices(valid_ts)\
                            .window(window_size, shift=1)\
                            .flat_map(_sub_to_batch)\
                            .map(_reshape_tensor)\
                            .unbatch().shuffle(buffer_size=500, seed=_seed).batch(batch_size)\
                            .map(_return_input_output)

def _forecast_mae(y_pred, y_true):
    _y_pred = y_pred[:, -forecast_period:, :]
    _y_true = y_true[:, -forecast_period:, :]
    mae = tf.losses.MAE(_y_true, _y_pred)
    return mae

def _accuracy(y_pred, y_true):
    # print(y_true) => Tensor("sequential/time_distributed/Reshape_1:0", shape=(None, 34, 1), dtype=float32)
    # y_true[-forecast_period:, :]  =>   Tensor("strided_slice_4:0", shape=(None, 34, 1), dtype=float32)
    # y_true[:, -forecast_period:, :] => Tensor("strided_slice_4:0", shape=(None, 6, 1), dtype=float32)

    _y_pred = y_pred[:, -forecast_period:, :]
    _y_pred = tf.reshape(_y_pred, shape=[-1, forecast_period])
    _y_true = y_true[:, -forecast_period:, :]
    _y_true = tf.reshape(_y_true, shape=[-1, forecast_period])

    # MAPE: Tensor("Mean_1:0", shape=(None, 1), dtype=float32)
    MAPE = tf.math.reduce_mean(tf.math.abs((_y_pred - _y_true) / _y_true), axis=1, keepdims=True)

    accuracy = 1 - MAPE
    accuracy = tf.where(accuracy < 0, tf.zeros_like(accuracy), accuracy)
    accuracy = tf.reduce_mean(accuracy)
    return accuracy

model = k.models.Sequential([
    k.layers.Bidirectional(k.layers.LSTM(units=100, return_sequences=True), input_shape=(None, 1)),
    k.layers.Bidirectional(k.layers.LSTM(units=100, return_sequences=True)),
    k.layers.TimeDistributed(k.layers.Dense(1))
])

model_name = []
model_name_symbols = {"bidirectional": "BILSTM_1", "bidirectional_1": "BILSTM_2", "time_distributed": "td"}
for l in model.layers:
    model_name.append(model_name_symbols.get(l.name, l.name))

model_name = "_".join(model_name)
print(model_name)

for i, (x, y) in enumerate(train_ts_dataset):
    print(i, x.numpy().shape, y.numpy().shape)

数据集的形状输出如下：

BILSTM_1_BILSTM_2_td
0 (123, 34, 1) (123, 34, 1)
1 (123, 34, 1) (123, 34, 1)
2 (123, 34, 1) (123, 34, 1)
3 (123, 34, 1) (123, 34, 1)
4 (123, 34, 1) (123, 34, 1)
5 (123, 34, 1) (123, 34, 1)
6 (123, 34, 1) (123, 34, 1)
7 (123, 34, 1) (123, 34, 1)
8 (123, 34, 1) (123, 34, 1)

然后：

_datetime = datetime.datetime.now().strftime("%Y%m%d-%H-%M-%S")
_log_dir = os.path.join(".", "logs", "fit7", model_name, _datetime)

tensorboard_cb = k.callbacks.TensorBoard(log_dir=_log_dir)

model.compile(loss="mae", optimizer=tf.optimizers.Adam(learning_rate=0.001), metrics=[_forecast_mae, _accuracy])

history = model.fit(train_ts_dataset, epochs=100, validation_data=train_ts_dataset, callbacks=[tensorboard_cb])

我一直在研究训练集和验证集的损失函数，但我发现验证集的损失函数小于训练集的损失函数。这可能是欠拟合。然而，我将验证集替换为训练集，以便在训练和测试时监视损失和准确度，但我仍然得到验证准确度大于训练准确度的结果。下面是训练集和验证集的准确性：

对我来说，很奇怪的是，即使我在训练和测试中使用相同的数据集，我仍然得到比训练准确性更高的验证准确性。而且没有dropout、batchNormalization层等...

请问有什么原因导致这种情况吗？非常感谢您的帮助！

===================================================================

这里对代码进行了一些修改，以检查批处理大小是否有任何影响。此外，为了消除对tf.data.Dataset的疑虑，使用numpy数组作为输入。因此，新代码如下：

custom_train_ts   = train_ts.transpose(1, 0)[..., np.newaxis]
custom_train_ts_x = custom_train_ts[:, :window_size, :] # size: 123, window_size, 1
custom_train_ts_y = custom_train_ts[:, -window_size:, :] # size: 123, window_size, 1

custom_valid_ts   = valid_ts.transpose(1, 0)[..., np.newaxis]
custom_valid_ts_x = custom_valid_ts[:, :window_size, :]
custom_valid_ts_y = custom_valid_ts[:, -window_size:, :]
custom_valid_ts   = (custom_valid_ts_x, custom_valid_ts_y)

其次，为了确保准确度是基于整个数据集计算而非批量大小相关，我将原始数据集直接输入模型，没有进行分批处理。此外，我按如下方式实现了一项自定义指标：

def _accuracy(y_true, y_pred):
    # print(y_true) => Tensor("sequential/time_distributed/Reshape_1:0", shape=(None, 34, 1), dtype=float32)
    # y_true[-forecast_period:, :]  =>   Tensor("strided_slice_4:0", shape=(None, 34, 1), dtype=float32)
    # y_true[:, -forecast_period:, :] => Tensor("strided_slice_4:0", shape=(None, 6, 1), dtype=float32)

    _y_pred = y_pred[:, -forecast_period:, :]
    _y_pred = tf.reshape(_y_pred, shape=[-1, forecast_period])
    _y_true = y_true[:, -forecast_period:, :]
    _y_true = tf.reshape(_y_true, shape=[-1, forecast_period])

    # MAPE: Tensor("Mean_1:0", shape=(None, 1), dtype=float32)
    MAPE = tf.math.reduce_mean(tf.math.abs((_y_pred - _y_true) / _y_true), axis=1, keepdims=True)

    accuracy = 1 - MAPE
    accuracy = tf.where(accuracy < 0, tf.zeros_like(accuracy), accuracy)        
    accuracy = tf.reduce_mean(accuracy)
    return accuracy


class MyAccuracy(tf.keras.metrics.Metric):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.accuracy_function = _accuracy
        self.y_true_lst = []
        self.y_pred_lst = []

    def update_state(self, y_true, y_pred, sample_weight=None):
        self.y_true_lst.append(y_true)
        self.y_pred_lst.append(y_pred)

    def result(self):
        y_true_concat = tf.concat(self.y_true_lst, axis=0)
        y_pred_concat = tf.concat(self.y_pred_lst, axis=0)
        accuracy = self.accuracy_function(y_true_concat, y_pred_concat)
        self.y_true_lst = []
        self.y_pred_lst = []
        return accuracy
    def get_config(self):
        base_config = super().get_config()
        return {**base_config}

最后，模型编译和拟合如下：

model.compile(loss="mae", optimizer=tf.optimizers.Adam(hparams["learning_rate"]), 
              metrics=[tf.metrics.MAE, MyAccuracy()])

history = model.fit(custom_train_ts_x, custom_train_ts_y, epochs=120, batch_size=123, validation_data=custom_valid_ts, 
                    callbacks=[tensorboard_cb])

查看Tensorboard中的训练和验证准确度，我得到了以下结果：

因此，这显然毫无意义。此外，在此情况下，我确保只在调用result()后的纪元末计算准确性。但是，当查看验证损失时，我发现训练损失比验证损失低：

- I. A

你好！我对你的问题进行了一些编辑，因为最初的提问方式实际问题有点被淹没在中间了。对于一个很长、需要很多设置的问题，通常如果你在开头简洁明了地提出根本问题，然后再通过解释逐步回到这个问题，你会得到更好的记忆效果。如果你想做进一步的修改，请随意！我主要将一个段落复制到了开头，可能还有更好的写法。希望你能得到答案（我期待着看到它），祝好！ - Tadhg McDonald-Jensen

3

训练和验证的准确率之间的差异是否取决于batch_size？我的猜测是，批量越大，这种差异就越小。 - fdermishin

1

直觉是模型在训练过程中从一个批次到另一个批次发生变化。每个批次都计算损失，然后这些损失被聚合以计算整个 epoch 的总损失。如果将批量大小设置为最大值，则每个 epoch 仅计算一次训练损失。因此，如果您在训练时进行验证，则应基本上与验证损失相同。当存在多个批次时，差异就会出现，因为单独损失的聚合不会加总到总损失。但目前我正在努力弄清楚为什么聚合损失明显高于总损失。 - fdermishin

2

@user13044086，请将您的见解发布为答案。这有助于网站，在答案中发布类似内容（更不用说赏金了：） - Tadhg McDonald-Jensen

1

最后一个图表看起来非常奇怪，因为在基于相同的数据和模型状态计算上一个时期的验证损失之后，立即计算该时期的训练损失。它们应该是相等的。你能否在调用fit之前和之后添加手动计算损失的步骤（调用模型并计算损失）？ - Andrey

显示剩余6条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- amin · Accepted Answer

它们不同的原因在于优化器在每个批次结束时更新参数，而val_loss将在最后计算，但train_loss将在过程中计算。

即使您的批处理中只有一个样本，并且每个时期中只有一个批处理，它们也会彼此不同，因为网络将对您的样本进行前向传递并计算损失，这将被称为train_loss，然后更新参数并再次计算损失，这次将被称为val_loss（在这种情况下，您下一个时期的train_loss将等于当前的val_loss）。

因此，如果您想检查我刚才说的是否正确，可以将您的优化器的learning_rate设置为0，那么您将得到相同的损失。

这是我在MNIST上测试同样问题的代码（暂时您可以从这里查看代码和结果）：

# ---------------------------------
# Importing stuff
import numpy as np
import tensorflow as tf
from tensorflow import keras

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import *

from keras.utils import to_categorical

# ---------------------------------
(trainX, trainy), (testX, testy) = keras.datasets.mnist.load_data()

# one-hot
trainy = to_categorical(trainy, 10)
testy = to_categorical(testy, 10)

# image should be in shape of (28, 28, 1) not (28, 28)
trainX = np.expand_dims(trainX, -1)
testX = np.expand_dims(testX, -1)

# normalize
trainX = trainX/255.
testX = testX/255.

# ---------------------------------
# Build the model
model = Sequential()
model.add(Input(trainX.shape[1:]))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))

model.summary()

编译和适配多种场景：

# training on 1 sample, but with learning_rate != 0
opt = keras.optimizers.Adam(learning_rate = 0.001)
model.compile(optimizer = opt, loss='categorical_crossentropy', metrics=['categorical_accuracy'])

batchX = trainX[0].reshape(1, 28, 28, 1)
batchy = trainy[0].reshape(1, 10)

model.fit(batchX, batchy, validation_data = (batchX, batchy), batch_size = 1, 
          shuffle = False, validation_batch_size = 1, epochs = 5)

# You will see that the loss and val_loss are different and the
# next steps loss is equal to the current steps val_loss

# training on 1 sample, with learning_rate == 0
opt = keras.optimizers.Adam(learning_rate = 0)
model.compile(optimizer = opt, loss='categorical_crossentropy', metrics=['categorical_accuracy'])

batchX = trainX[0].reshape(1, 28, 28, 1)
batchy = trainy[0].reshape(1, 10)

model.fit(batchX, batchy, validation_data = (batchX, batchy), batch_size = 1, 
          shuffle = False, validation_batch_size = 1, epochs = 5)

# You will see that the loss and val_loss are equal because 
# the parameters will not change

# training on the complete dataset but with learning_rate != 0
opt = keras.optimizers.Adam(learning_rate = 0.001)
model.compile(optimizer = opt, loss='categorical_crossentropy', metrics=['categorical_accuracy'])

model.fit(trainX, trainy, validation_data = (trainX, trainy), batch_size = 32, 
          shuffle = False, validation_batch_size = 32, epochs = 5)

# this is similar to the case you asked

# training on the complete dataset and learning_rate == 0
opt = keras.optimizers.Adam(learning_rate = 0)
model.compile(optimizer = opt, loss='categorical_crossentropy', metrics=['categorical_accuracy'])

model.fit(trainX, trainy, validation_data = (trainX, trainy), batch_size = 32, 
          shuffle = False, validation_batch_size = 32, epochs = 5)

# set the learning_rate to zero and again you'll get loss == val_loss