Keras：如何保存历史训练属性（history object）的训练记录？

Question

Keras：如何保存历史训练属性（history object）的训练记录？

pythonmachine-learningneural-networkdeep-learningkeras

118

在Keras中，我们可以将model.fit的输出返回到一个历史记录中，如下所示：

 history = model.fit(X_train, y_train, 
                     batch_size=batch_size, 
                     nb_epoch=nb_epoch,
                     validation_data=(X_test, y_test))

现在，如何将历史记录对象的历史属性保存到文件中以供进一步使用（例如绘制准确率或损失与时期的图）？

- jwm

2

如果有帮助的话，您也可以使用Keras的CSVLogger()回调函数，如此处所述：https://keras.io/callbacks/#csvlogger - swiss_knight

1

有人推荐一种方法来保存由 fit 返回的 history 对象吗？它包含了 .params 属性中的有用信息，我也想保留它。是的，我可以分别保存 params 和 history 属性，或者将它们组合成一个字典，但我对一种简单的保存整个 history 对象的方法感兴趣。 - user3731622

10个回答

63

另一种方法：

由于history.history是一个字典，你也可以将其转换为一个pandas的DataFrame对象，然后根据你的需求进行保存。

步骤如下：

import pandas as pd

# assuming you stored your model.fit results in a 'history' variable:
history = model.fit(x_train, y_train, epochs=10)

# convert the history.history dict to a pandas DataFrame:     
hist_df = pd.DataFrame(history.history) 

# save to json:  
hist_json_file = 'history.json' 
with open(hist_json_file, mode='w') as f:
    hist_df.to_json(f)

# or save to csv: 
hist_csv_file = 'history.csv'
with open(hist_csv_file, mode='w') as f:
    hist_df.to_csv(f)

- swiss_knight

你会如何重新加载它？ - jtlz2

你可以使用 pd.read_csv('history.csv') 将其作为数据框读取。 - Mohammed Nadeem

1

我使用了这个，因为对我来说更容易。 - Caner Erden

1

听起来不错。.csv 比 .pkl 更通用。我可以用这种方式在 R 中加载它，甚至可以在 Excel 中打开它，如果我只是想看看里面有什么。 - Manuel Popp

38

最简单的方法：

保存：

np.save('my_history.npy',history.history)

加载中:

history=np.load('my_history.npy',allow_pickle='TRUE').item()

那么历史就像一个字典，你可以使用键来检索所有需要的值。

- Arman

我认为这应该是最佳答案。 - Gautam Chettiar

18

模型历史可以按照以下方式保存到文件中：model.save(filepath)

import json
hist = model.fit(X_train, y_train, epochs=5, batch_size=batch_size,validation_split=0.1)
with open('file.json', 'w') as f:
    json.dump(hist.history, f)

- Ashok Kumar Jayaraman

19

在TensorFlow Keras中，这种方式不再可行。我遇到了一个错误：TypeError: Object of type 'float32' is not JSON serializable. 我必须使用json.dump(str(hist.history, f))来解决这个问题。 - BraveDistribution

@BraveDistribution 请记住，您可以像这个答案中所示一样为json指定编码器。因此，虽然这段代码不起作用，但如果您使用cls参数指定编码器，则json仍然可行。 - Kraigolas

12

一个 history 对象拥有一个 history 字段，它是一个字典，存储了跨越每个训练时期的不同训练指标。例如，history.history['loss'][99] 将返回模型在第100个训练时期的损失。为了保存它，您可以将这个字典 pickle 或者直接将此字典中的不同列表保存到适当的文件中。

- Marcin Możejko

7

我遇到了这样的问题，就是keras中列表内的值不可被json序列化。因此，我写了这两个方便的函数供自己使用。

import json,codecs
import numpy as np
def saveHist(path,history):
    
    new_hist = {}
    for key in list(history.history.keys()):
        new_hist[key]=history.history[key]
        if type(history.history[key]) == np.ndarray:
            new_hist[key] = history.history[key].tolist()
        elif type(history.history[key]) == list:
           if  type(history.history[key][0]) == np.float64:
               new_hist[key] = list(map(float, history.history[key]))
            
    print(new_hist)
    with codecs.open(path, 'w', encoding='utf-8') as file:
        json.dump(new_hist, file, separators=(',', ':'), sort_keys=True, indent=4) 

def loadHist(path):
    with codecs.open(path, 'r', encoding='utf-8') as file:
        n = json.loads(file.read())
    return n

在这里，saveHist只需要获取应该保存json文件的路径以及从keras的fit或fit_generator方法返回的history对象。

- Kev1n91

1

感谢您提供重新加载代码。如果有一种方法可以将额外的历史记录（即来自model.fit()）附加到重新加载的历史记录中，那就更好了。我正在研究这个问题。 - Mark Cramer

@MarkCramer难道不应该是保存原始history对象的所有参数，重新加载history对象并使用它来设置模型，对重新加载的模型进行拟合并将结果捕获在新的history对象中，然后将新的history对象中的信息连接到原始history对象中吗？ - jschabs

@jschabs，是的，就像那样，但不幸的是它很复杂。我已经想出了答案，所以我会提供一个答案。 - Mark Cramer

给我返回 newchars, decodedbytes = self.decode(data, self.errors) - Mubeen Khan

3

我相信有很多方法可以做到这一点，但我自己尝试了一下并想出了自己的版本。

首先，一个定制的回调函数使得在每个时期结束时能够获取和更新历史记录。在这里我还有一个回调来保存模型。这两个都很方便，因为如果你崩溃或关闭，你可以从最后完成的时期重新开始训练。

class LossHistory(Callback):
    
    # https://dev59.com/O7Dla4cB1Zd3GeqP50sw#53653154
    def on_epoch_end(self, epoch, logs = None):
        new_history = {}
        for k, v in logs.items(): # compile new history from logs
            new_history[k] = [v] # convert values into lists
        current_history = loadHist(history_filename) # load history from current training
        current_history = appendHist(current_history, new_history) # append the logs
        saveHist(history_filename, current_history) # save history from current training

model_checkpoint = ModelCheckpoint(model_filename, verbose = 0, period = 1)
history_checkpoint = LossHistory()
callbacks_list = [model_checkpoint, history_checkpoint]

其次，以下是一些“辅助”函数，它们能够确切地执行它们所声明的任务。这些函数都从LossHistory()回调函数中调用。

# https://dev59.com/hlgR5IYBdhLWcg3wp-q5#54092401
import json, codecs

def saveHist(path, history):
    with codecs.open(path, 'w', encoding='utf-8') as f:
        json.dump(history, f, separators=(',', ':'), sort_keys=True, indent=4) 

def loadHist(path):
    n = {} # set history to empty
    if os.path.exists(path): # reload history if it exists
        with codecs.open(path, 'r', encoding='utf-8') as f:
            n = json.loads(f.read())
    return n

def appendHist(h1, h2):
    if h1 == {}:
        return h2
    else:
        dest = {}
        for key, value in h1.items():
            dest[key] = value + h2[key]
        return dest

接下来，您只需要将history_filename设置为类似于data/model-history.json的内容，同时将model_filename设置为类似于data/model.h5的内容。在训练结束时，为了确保不会破坏您的历史记录（假设您停止并重新开始），并将回调函数插入其中，可以进行最后的微调：

new_history = model.fit(X_train, y_train, 
                     batch_size = batch_size, 
                     nb_epoch = nb_epoch,
                     validation_data=(X_test, y_test),
                     callbacks=callbacks_list)

history = appendHist(history, new_history.history)

每当您需要时，history = loadHist(history_filename) 就可以将您的历史记录加载回来。

这个方法的巧妙之处在于使用了JSON和列表，但是我没有办法在不进行迭代转换的情况下使其正常工作。不管怎样，我知道这个方法可行，因为我已经在上面工作了数天。可能在 https://dev59.com/hlgR5IYBdhLWcg3wp-q5#44674337 中使用 pickle.dump 更好，但我不知道那是什么。如果我遗漏了什么或者您无法使其正常工作，请告诉我。

- Mark Cramer

1

谢谢！非常有用！可以通过将历史记录存储在内存中而不是在每个纪元之后从文件中加载历史记录来加快速度，但考虑到这种加载/保存与实际训练相比所需的时间非常短，我认为保持代码现状也没问题。 - ias

1

追加是个不错的点子！ - jtlz2

@ias - 确切地说 - 但是如何 - 将打开的 fh 传递给周围..? - jtlz2

1

你可以将tf.keras.callbacks.History的历史属性保存为.txt格式。

with open("./result_model.txt",'w') as f:
    for k in history.history.keys():
        print(k,file=f)
        for i in history.history[k]:
            print(i,file=f)

- Ankur

0

这里有一个回调函数，可以将日志存储到文件中。在实例化回调对象时提供模型文件路径，这将创建一个关联的文件 - 假设模型路径为'/home/user/model.h5'，则pickle文件路径为'/home/user/model_history_pickle'。重新加载模型时，回调将从上次停止的epoch继续。


    import os
    import re
    import pickle
    #
    from tensorflow.keras.callbacks import Callback
    from tensorflow.keras import backend as K
class PickleHistoryCallback(Callback):
        def __init__(self, path_file_model, *args, **kwargs):
            super().__init__(*args, **kwargs)
            self.__path_file_model = path_file_model
            #
            self.__path_file_history_pickle = None
            self.__history = {}
            self.__epoch = 0
            #
            self.__setup()
        #
        def __setup(self):
            self.__path_file_history_pickle = re.sub(r'\.[^\.]*$', '_history_pickle', self.__path_file_model)
            #
            if (os.path.isfile(self.__path_file_history_pickle)):
                with open(self.__path_file_history_pickle, 'rb') as fd:
                    self.__history = pickle.load(fd)
                    # Start from last epoch
                    self.__epoch = self.__history['e'][-1]
            #
            else:
                print("无法找到已存储的历史记录文件；将在第一次训练周期后创建以下存储的历史记录文件:\n\t{}".format(
                    self.__path_file_history_pickle))
        #
        def __update_history_file(self):
            with open(self.__path_file_history_pickle, 'wb') as fd:
                pickle.dump(self.__history, fd)
        #
        def on_epoch_end(self, epoch, logs=None):
            self.__epoch += 1
            logs = logs or {}
            #
            logs['e'] = self.__epoch
            logs['lr'] = K.get_value(self.model.optimizer.lr)
            #
            for k, v in logs.items():
                self.__history.setdefault(k, []).append(v)
            #
            self.__update_history_file()

- QuintoViento

pckl_hstry_c = PickleHistoryCallback(path_file_model); list_callbacks += [pckl_hstry_c]; history = model.fit( X_train, Y_train, validation_data=(X_validation, Y_validation), verbose=0, callbacks=list_callbacks ); - QuintoViento

0

以上答案在训练过程结束时保存历史记录时非常有用。如果您想在训练期间保存历史记录，CSVLogger回调函数将会很有帮助。

以下代码将以数据表格文件的形式保存模型权重和历史训练记录，文件名为log.csv。

model_cb = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path)
history_cb = tf.keras.callbacks.CSVLogger('./log.csv', separator=",", append=False)

history = model.fit(callbacks=[model_cb, history_cb])

- tngotran

如何重新加载它？ - jtlz2

CSVLogger在训练期间没有保存历史记录对象，而是在训练结束时保存。因此，如果训练被中断，则历史记录将丢失。有什么想法如何解决这个问题吗？ - Al_Mt

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- AEndrs · Accepted Answer

我使用的是以下内容：

with open('/trainHistoryDict', 'wb') as file_pi:
    pickle.dump(history.history, file_pi)

这样，我将历史记录保存为一个字典，以防后面想要绘制损失或准确率。当您想要重新加载历史记录时，可以使用以下方法：

with open('/trainHistoryDict', "rb") as file_pi:
    history = pickle.load(file_pi)

为什么选择pickle而不是json？

这个答案下的评论准确地说明了：

[将历史记录存储为json]在tensorflow keras中不再起作用。我遇到了TypeError：Object of type 'float32' is not JSON serializable.

有方法可以告诉json如何编码numpy对象，可以从这个问题中了解到，因此在这种情况下使用json没有问题，但相对于将数据转储到pickle文件中而言，会更加复杂。