Warm tip: This article is reproduced from stackoverflow.com, please click
bash gzip

How to delete the first line from a gzip file without decompressing?

发布于 2020-05-13 13:30:43

I have a large gzip file that is slow to decompress. How do I delete the first line in-place without decompressing the entire file?

Questioner
Jase
Viewed
17
Yunnosch 2020-02-28 17:33

Zip algorithm uses already decompressed content as lookup table for the following content. I believe that this directly means that if you delete the first line, it definitly requires to recompress the rest of the file, which in turn implies the need to first decompress it.

So I believe the answer is: Not.

Going into the details of actually implementing zip algorithm (to be precise Lempel Ziv compression algorithm), you find that there are data windows of certain sizes.
There is a maxim length of coming data which can be decompressed, determined by the size "ahead" window. There is also a maximum distance at which data can be used as lookup among the already decompressed data, the "back" window.
It might hence be possible to only decompress a part of the compressed data, large enough to make sure that the rest of the compressed data does not reference anything before it. I.e. so large that from a certain point in compressed data no references are occurring anymore to what you are going to delete. Then you can recompress that part without the first line you want to get rid of.

I believe however that this approach is beyond your question. Otherwise you would have provided much more information.

So I think I will stay with: Not.

Or at least:
You will have to really learn about the Zip algorithm, to the point that you can yourself implment it. Then learn even more about the precise implementation of the algorithm in the file you are dealing with. Then learn about the precise configuration of the compression you are looking at (sizes of the two windows).
Then spend a lot of effort.

Going into the details of how exacty to do that is beyond an answer here.