Warm tip: This article is reproduced from serverfault.com, please click

deep-learning machine-learning neural-network python pytorch

How to prevent memory use growth when updating weights and biases in a Pytorch model

发布于 2020-11-25 07:54:22

I'm trying to build a VGG16 model to make an ONNX export using Pytorch. I want to force the model with my own set of weights and biases. But in this process my computer quickly runs out of memory.

Here is how I want to do it (this is only a test, in the real version I read the weights and biases in a set of files), this example only force all values to 0.5

# Create empty VGG16 model (random weights)
from torchvision import models
from torchsummary import summary

vgg16 = models.vgg16()
# la structure est : vgg16.__dict__
summary(vgg16, (3, 224, 224))

#  convolutive layers
for layer in vgg16.features:
    print()
    print(layer)
    if (hasattr(layer,'weight')):
        dim = layer.weight.shape
        print(dim)
        print(str(dim[0]*(dim[1]*dim[2]*dim[3]+1))+' params')

        # Remplacement des poids et biais
        for i in range (dim[0]):
            layer.bias[i] = 0.5
            for j in range (dim[1]):
                for k in range (dim[2]):
                    for l in range (dim[3]):
                        layer.weight[i][j][k][l] = 0.5

# Dense layers
for layer in vgg16.classifier:
    print()
    print(layer)
    if (hasattr(layer,'weight')):
        dim = layer.weight.shape
        print(str(dim)+' --> '+str(dim[0]*(dim[1]+1))+' params')
        for i in range(dim[0]):
            layer.bias[i] = 0.5
            for j in range(dim[1]):
                layer.weight[i][j] = 0.5

When I look at the memory usage of the computer, it grows linealrly and saturates the 16GB RAM during the first dense layer processing. Then python crashes...

Is there another better way to do this, keeping in mind that I want to onnx export the model afterwards? Thanks for your help.

Questioner

Fabrice Auzanneau

Viewed

0

Poe Dator 2020-11-25 18:00:42

The memory growth is caused by the need to adjust gradient for every weight and bias change. Try setting .requires_grad attribute to False before the update and restoring it after the update. Example:

for layer in vgg16.features:
    print()
    print(layer)
    if (hasattr(layer,'weight')):
        
        # supress .requires_grad
        layer.bias.requires_grad = False
        layer.weight.requires_grad = False
        
        dim = layer.weight.shape
        print(dim)
        print(str(dim[0]*(dim[1]*dim[2]*dim[3]+1))+' params')

        # Remplacement des poids et biais
        for i in range (dim[0]):
            layer.bias[i] = 0.5
            for j in range (dim[1]):
                for k in range (dim[2]):
                    for l in range (dim[3]):
                        layer.weight[i][j][k][l] = 0.5
        
        # restore .requires_grad
        layer.bias.requires_grad = True
        layer.weight.requires_grad = True

Fabrice Auzanneau 2020-11-25 15:00:17

Thanks. I'll try this.

Fabrice Auzanneau 2020-11-25 15:16:16

It's perfect! The memory usage remains steady, even for the dense layers. Thanks again.

Fabrice Auzanneau 2020-11-27 09:16:02

how? (sorry newbie here)

Poe Dator 2020-11-28 22:20:43

To mark an answer as accepted, click on the check mark beside the answer to toggle it from greyed out to filled in. stackoverflow.com/help/someone-answers

热门帖子

1

虚心求教，数据量上亿的爬虫数据用什么该用什么数据库呢

2

google doc如何快速插入日期时间？

3

onedrive 登陆问题

4

成都租车被坑经历

5

代开百度网盘 svip

6

求一个 Spotify 长期车

7

有内嵌的简单 mysql 版本的 MQ 吗

8

[送码] AIBotPro 一个不仅仅做 AI 服务集成的网站，提供最少一小时 10 次的 gpt4 服务，且有可玩性，已开源

9

2024 年了，兄弟们说说用 Tauri 遇到的哪些坑

10

C++新手，求助一个关于怎么使用第三方库的问题

热门github

1

A multi-platform library for OpenGL, OpenGL ES, Vulkan, window and input

2

Dev tool that writes scalable apps from scratch while the developer oversees the implementation

3

shadcn/ui, but for Svelte. ✨

4

The Python Risk Identification Tool for generative AI (PyRIT) is an open access automation framework to empower security professionals and machine learning engineers to proactively find risks in their generative AI systems.

5

Performance-portable, length-agnostic SIMD with runtime dispatch

6

ZK Credo

7

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

8

Joplin - the secure note taking and to-do app with synchronisation capabilities for Windows, macOS, Linux, Android and iOS.

9

Mamba is a new state space model architecture showing promising performance on information-dense data such as language modeling, where previous subquadratic models fall short of Transformers. It is based on the line of progress on structured state space models, with an efficient hardware-aware design and implementation in the spirit of FlashAttention.

10

This repository contains System Design resources which are useful while preparing for interviews and learning Distributed Systems

11

Curso para aprender el lenguaje de programación Python desde cero y para principiantes. 75 clases, 37 horas en vídeo, código, proyectos y grupo de chat. Fundamentos, frontend, backend, testing, IA...

12

🎓 Path to a free self-taught education in Computer Science!

13

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java

14

A collective list of free APIs

15

📚 Freely available programming books