Tensorflow中的预取生成器(序列)

Question

Tensorflow中的预取生成器(序列)

pythontensorflowneural-networkdatasetgenerator

3

我将尝试使用tf.keras.utils.Sequence方法实现一个生成器，按照这个Github页面的方式进行操作： https://mahmoudyusof.github.io/facial-keypoint-detection/data-generator/ 因此，我的生成器形式如下：

class Generator(tf.keras.utils.Sequence):

  def __init__(self, *args, **kwargs):
    self.on_epoch_end()

  def on_epoch_end(self):
    #shuffle indices for batches

  def __len__(self):

  def __getitem__(self, idx):    
  #returning the idxth batch of the shuffled dataset    
  return X, y

很不幸，使用这个生成器后我的模型训练过程变得非常漫长，因此我想要预取它。

我尝试了：

Train_Generator = tf.data.Dataset.from_generator(Generator(Training_Files, batch_size=64, shuffle = True), output_types=(np.array, np.array))

将生成器转换为可进行预取的类型。我收到了错误信息：

`generator` must be callable.

我知道这个需要生成器支持Iter()-协议才能工作。但是我该如何实现呢？或者你们知道其他方法来提高这类生成器的性能吗？

谢谢！

- Samuel K.

1

快速评论：您最好使用 tf.data 函数重写您的管道。指南 tf.data: Build TensorFlow input pipelines 可能会有所帮助。使用 from_generator 有一些缺点，因为它在底层使用了 tf.numpy_function。如果您想要实现迭代器协议，则可以在文档中找到：Iterator Types。 - Lescurel

感谢您的回复！我使用生成器的原因是我的数据保存为tfRecord文件，但我需要将单个协议缓冲区拆分为多个训练集。（我需要拆分保存的时间序列）除了使用生成器，我找不到更优雅的方法来做到这一点。或者有没有直接使用.map(parse)函数来完成这个任务的方法呢？非常感谢您的帮助。 - Samuel K.

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- oattao · Accepted Answer

我建议这样做：

Train_Generator = tf.data.Dataset.from_generator(Generator, args=[Training_Files, 64, True], output_types=(np.array, np.array))