Warm tip: This article is reproduced from serverfault.com, please click

Why tensorflow modules taking up all the GPU memory?

发布于 2020-12-01 09:13:01

I am training an U-net on TensorFlow 2. When I load the model, it takes up almost all the memory of the GPU (22 GB out 26 GB), though my model is supposed to take up at best 1.5 GB of memory with 190 million parameters. To understand the problem, I tried to load a model that didn't have any layers, and to my surprise it was still taking up the same amount of memory. The code for my model is attached below:

x = tf.keras.layers.Input(shape=(256,256,1))

model = Sequential(
    [
        Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        MaxPooling2D(pool_size=(2, 2)),

        Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        MaxPooling2D(pool_size=(2, 2)),

        Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        MaxPooling2D(pool_size=(2, 2)),

        Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        MaxPooling2D(pool_size=(2, 2)),

        Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Activation('relu')(Add()([conv5_0, conv5_2])),
        MaxPooling2D(pool_size=(2, 2)),

        Conv2D(2048, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(2048, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(2048, 3, padding = 'same', kernel_initializer = 'he_normal'),

        UpSampling2D(size = (2,2)),
        Conv2D(1024, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),

        UpSampling2D(size = (2,2)),
        Conv2D(512, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),

        UpSampling2D(size = (2,2)),
        Conv2D(256, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),

        UpSampling2D(size = (2,2)),
        Conv2D(128, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),

        UpSampling2D(size = (2,2)),
        Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'), 
        Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),

        Conv2D(1, 3, activation = 'linear', padding = 'same', kernel_initializer = 'he_normal')
    ])

y = model(x)

I commented out all the layers and it was still taking up 22 GB. I am using jupyter-notebook to run the code. I thought adding tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=x) in the beginning of my jupyter notebook would solve the problem but it did not. My goal is to run multiple scripts simultaneously on the GPU to make more efficient use of my time. Any help would be much appreciated. Thank you.

NB: Just noticed that it doesn't only happen for this code, but any other Tensorflow module. For example, at some point of my code, I used tf.signal.ifft2 before loading the model and it also took up almost the same memory as the model. How to get around this problem?

Questioner
shaurov2253
Viewed
0
DachuanZhao 2020-12-01 18:07:47

Further discuss can be found at https://www.tensorflow.org/guide/gpu , You should read it .