Warm tip: This article is reproduced from serverfault.com, please click

Tensorflow stuck for seconds at the end of every epoch

发布于 2020-12-15 16:37:16

I'm training a Neural Network over a TFRecordDataset. However, at the end of every epoch, i.e. with ETA: 0s, the training gets stuck for tens of seconds. For reference, one epoch takes around a minute to be completed over a dataset of around 25GB (before parsing a subset of the features).

I'm running TensorFlow 2.3.1 with a Nvidia Titan RTX GPU. Is this the intended behavior? Maybe due to the preprocessing in the input pipeline? Is that preprocessing performed by the CPU only or offloaded to the GPU? Thanks!

Questioner
Antonio Albanese
Viewed
0
Nicolas Gervais 2020-12-16 00:46:37

If you have a validation set and you're using model.fit(), it's probably the time it takes to calculate the loss and the metrics. In most cases, it should take an extra 25% to compute the metrics of a 80/20 split.