May 16, 2018, midnight

TensorFlow GPU Bottlenecks

By : Eric A. Scuccimarra

I was training a model on a Google Cloud instance with a Tesla K80 GPU. This particular model had more data pre-processing required than normal. The model was training very slowly, the GPU usage was oscillating between 0% and 75-100%. I thought the CPU was the bottleneck and was trying to put as much pre-processing on the GPU as possible.

I read TensorFlow's optimization guide, which suggested forcing the pre-processing to be on the CPU by enclosing it with:

with tf.device('/cpu:0'):

Since I thought the CPU was the bottleneck I didn't think that would help, but I tried it anyway because I had no other good ideas and was surprised that it worked like magic! The GPU usage now stays constant around 95-100% while the CPU usage stays at about the same levels as before.

Labels: machine_learning , tensorflow , google_cloud