Warm tip: This article is reproduced from serverfault.com, please click

How to solve CNN model fitting problem in tensorflow 2.2.0?

发布于 2020-11-30 14:36:46

I want to train a CNN model with image data. I have 2 classes (mask and without mask). I import and save data the following code:

data_path='/train/'
categories=os.listdir(data_path)
labels=[i for i in range(len(categories))]
label_dict=dict(zip(categories,labels))
data=[]
target=[]
for category in categories:
    folder_path=os.path.join(data_path,category)
    img_names=os.listdir(folder_path)
    for img_name in img_names:
        img_path=os.path.join(folder_path,img_name)
        img=cv2.imread(img_path)
        try:
            gray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) 
            resized=cv2.resize(gray,(500, 500))#dataset
            data.append(resized)
            target.append(label_dict[category])
        except Exception as e:
            print('Exception:',e)
data=np.array(data)/255.0
data=np.reshape(data,(data.shape[0],500, 500,1))
target=np.array(target)
new_target=np_utils.to_categorical(target)
#np.save('data',data)
#np.save('target',new_target)

and I build the model like this:

model=tf.keras.models.Sequential([
    Conv2D(32, 1, activation='relu', input_shape=(500, 500, 1)),
    MaxPooling2D(2,2),
    Conv2D(64, 1, activation='relu'),
    MaxPooling2D(2,2),
    Conv2D(128, 1, padding='same', activation='relu'),
    MaxPooling2D(2,2),
    Flatten(),
    Dropout(0.5), 
    Dense(256, activation='relu'),
    Dense(2, activation='softmax') # dense layer has a shape of 2 as we have only 2 classes 
])
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])

model.summary give me following results:

________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 500, 500, 32)      64        
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 250, 250, 32)      0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 250, 250, 64)      2112      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 125, 125, 64)      0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 125, 125, 128)     8320      
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 62, 62, 128)       0         
_________________________________________________________________
flatten (Flatten)            (None, 492032)            0         
_________________________________________________________________
dropout (Dropout)            (None, 492032)            0         
_________________________________________________________________
dense (Dense)                (None, 256)               125960448 
_________________________________________________________________
dense_1 (Dense)              (None, 2)                 514       
=================================================================
Total params: 125,971,458
Trainable params: 125,971,458
Non-trainable params: 0

and then I fit the model but kernel stop. My fitting code is:

history=model.fit(data, target, epochs=10, batch_size=128, validation_data=data_val)

my tensorflow version is 2.2.0. Why doesn't run my model?

Questioner
Jade
Viewed
0
Akshay Sehgal 2020-11-30 23:08:23

It seems your kernel is dying (being killed) as the thread is taking too many resources. Seems you are making an unnecessary complex model by adding too many connections and trainable parameters. In fact, the single dense layer in fact is responsible for 99.991% of all your trainable parameters (125,960,448 / 125,971,458).

The issue is you are running out of computation resources (primarily RAM). Just to give you a context, following are some of the most influential CNN based architectures, most of which have been trained for DAYS on power GPUs.

LeNet-5 - 60,000 parameters
AlexNet - 60M paramters
VGG-16 - 138M paramters
Inception-v1 - 5M parameters
Inception-v3 - 24M parameters
ResNet-50 - 26M parameters
Xception - 23M parameters
Inception-v4 - 43M parameters
Inception-ResNet-V2 - 56M parameters
ResNeXt-50 - 25M parameters

Your basic 2 CNN stack model - 125M parameters!

Here is what you can do -

flatten (Flatten)            (None, 492032)            0         
_________________________________________________________________
dropout (Dropout)            (None, 492032)            0         
_________________________________________________________________
dense (Dense)                (None, 256)               125960448 <---!!!!
_________________________________________________________________

You are flattening a 62x62x128 tensor to 492,000 length vector! Instead either try adding more CNN to bring the first 2 dims of the more manageable AND/OR increase the size of kernel in previous CNNs.

The goal here is to have a manageable sized tensor before you hit the Dense layer. Also, try reducing the number of nodes in dense layer drastically.

Try something like this for starters, something that your device can actually handle without killing the kernel, say with 68k parameters (you should go simpler though and increase complexity later.)

model=tf.keras.models.Sequential([
    Conv2D(32, 3, activation='relu', input_shape=(500, 500, 1)),
    MaxPooling2D(3,3),
    Conv2D(64, 3, activation='relu'),
    MaxPooling2D(3,3),
    Conv2D(128, 3, padding='same', activation='relu'),
    MaxPooling2D(3,3),
    Conv2D(256, 3, padding='same', activation='relu'),
    MaxPooling2D(3,3),
    Flatten(),
    Dropout(0.5), 
    Dense(32, activation='relu'),
    Dense(2, activation='softmax') # dense layer has a shape of 2 as we have only 2 classes 
])
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_19 (Conv2D)           (None, 498, 498, 32)      320       
_________________________________________________________________
max_pooling2d_18 (MaxPooling (None, 166, 166, 32)      0         
_________________________________________________________________
conv2d_20 (Conv2D)           (None, 164, 164, 64)      18496     
_________________________________________________________________
max_pooling2d_19 (MaxPooling (None, 54, 54, 64)        0         
_________________________________________________________________
conv2d_21 (Conv2D)           (None, 54, 54, 128)       73856     
_________________________________________________________________
max_pooling2d_20 (MaxPooling (None, 18, 18, 128)       0         
_________________________________________________________________
conv2d_22 (Conv2D)           (None, 18, 18, 256)       295168    
_________________________________________________________________
max_pooling2d_21 (MaxPooling (None, 6, 6, 256)         0         
_________________________________________________________________
flatten_5 (Flatten)          (None, 9216)              0         
_________________________________________________________________
dropout_5 (Dropout)          (None, 9216)              0         
_________________________________________________________________
dense_10 (Dense)             (None, 32)                294944    
_________________________________________________________________
dense_11 (Dense)             (None, 2)                 66        
=================================================================
Total params: 682,850
Trainable params: 682,850
Non-trainable params: 0
_________________________________________________________________