I'm trying to build a VGG16 model to make an ONNX export using Pytorch. I want to force the model with my own set of weights and biases. But in this process my computer quickly runs out of memory.
Here is how I want to do it (this is only a test, in the real version I read the weights and biases in a set of files), this example only force all values to 0.5
# Create empty VGG16 model (random weights)
from torchvision import models
from torchsummary import summary
vgg16 = models.vgg16()
# la structure est : vgg16.__dict__
summary(vgg16, (3, 224, 224))
# convolutive layers
for layer in vgg16.features:
print()
print(layer)
if (hasattr(layer,'weight')):
dim = layer.weight.shape
print(dim)
print(str(dim[0]*(dim[1]*dim[2]*dim[3]+1))+' params')
# Remplacement des poids et biais
for i in range (dim[0]):
layer.bias[i] = 0.5
for j in range (dim[1]):
for k in range (dim[2]):
for l in range (dim[3]):
layer.weight[i][j][k][l] = 0.5
# Dense layers
for layer in vgg16.classifier:
print()
print(layer)
if (hasattr(layer,'weight')):
dim = layer.weight.shape
print(str(dim)+' --> '+str(dim[0]*(dim[1]+1))+' params')
for i in range(dim[0]):
layer.bias[i] = 0.5
for j in range(dim[1]):
layer.weight[i][j] = 0.5
When I look at the memory usage of the computer, it grows linealrly and saturates the 16GB RAM during the first dense layer processing. Then python crashes...
Is there another better way to do this, keeping in mind that I want to onnx export the model afterwards? Thanks for your help.
The memory growth is caused by the need to adjust gradient for every weight and bias change. Try setting .requires_grad
attribute to False
before the update and restoring it after the update. Example:
for layer in vgg16.features:
print()
print(layer)
if (hasattr(layer,'weight')):
# supress .requires_grad
layer.bias.requires_grad = False
layer.weight.requires_grad = False
dim = layer.weight.shape
print(dim)
print(str(dim[0]*(dim[1]*dim[2]*dim[3]+1))+' params')
# Remplacement des poids et biais
for i in range (dim[0]):
layer.bias[i] = 0.5
for j in range (dim[1]):
for k in range (dim[2]):
for l in range (dim[3]):
layer.weight[i][j][k][l] = 0.5
# restore .requires_grad
layer.bias.requires_grad = True
layer.weight.requires_grad = True
Thanks. I'll try this.
It's perfect! The memory usage remains steady, even for the dense layers. Thanks again.
how? (sorry newbie here)
To mark an answer as accepted, click on the check mark beside the answer to toggle it from greyed out to filled in. stackoverflow.com/help/someone-answers