unix-high RAM usage by eval function in python

Mattias Nilsson 2020-11-30 22:07:56

1.3E9 arrays is going to be lots and lots of bytes if you read that into your application, no matter what you do.

I don't know if your code does what you actually want to do, but you're only ever using the first data item. If that's what you want to do, then don't read the whole file, just read that first part.

But also: I would advice against using "eval" for deserializing data. The built-in json module will give data in almost the same format (if you control the input format).

Still, in the end: If you want to hold that much data in your program, you're looking at many GB of memory usage.

If you just want to process it, I'd take a more iterative approach and do a little at the time rather than to swallow the whole files. Especially with limited resources.

Update: I See now that it's 1.3e6, not 1.3e9 entries. Big difference. :-) Then json data should be okay. On my machine a list of 1.3M ['RlFKUCUz', 'A', '4024', 'A'] takes about 250MB.

user2 2020-11-30 12:02:35

1.) the code does what it has to do. it is not really spectacular what it does. 2) what do you mean with "using the first data item"? if you mean to read just the first 200 parts of that string and convert just that part to the array then forget it! i need the hole array for comparing it with a mysql database. 3) thx for the json link. think thats the reason. i thought that something is wrong with that eval function. big thx! 4) yes i think i need really much GB of RAM for doing this for more then 100 useres at same time 5) that with a iterative approach i didnt understand. some example?

Mattias Nilsson 2020-11-30 14:00:28

@user2 2.) the myGoodFile[0][0][0] seems to only be accessing the very first data. 5.) Iterative approach means reading a little at the time rather than loading everything into memory at once. For example, if you had a different layout of the file where each "section" was one line, you could easily just process one line at the time and then throw it away. That way you would not need to hold all the data in memory at once.

user2 2020-11-30 22:21:39

you mean something like each line has his own array like [['RlFKUCUz', 'A', '4024', 'A'],['2', 'A', '4111', B'],['bla', 'X', '4024', 'C'], ....] where just the Arrays for A is for one line and B for the other and so on? then i could have between 50k and 70k for just one single line and not all of it. its a oppertunity but i think i will try first that JSON trick. maybe it works, but first i have to figure out how to do that and rewrite much of my old code. thx you and @MauriceMeyer. i got an 2nd way

high RAM usage by eval function in python

热门帖子

热门github