Warm tip: This article is reproduced from stackoverflow.com, please click
c++ gzip linux signals stdout

Capturing stdout to zip and interrupting using CTRL-C gives a corrupted zip file

发布于 2020-05-03 13:46:19

I am developing a C++ program that can run all day. It outputs to stdout and I want to compress this output. The uncompressed output can be many GB. A startup Bourne shell script compiles the C++ code and starts up the program like so:

./prog | gzip > output.gz

When I interrupt the script using CTRL-C, the .gz file is always corrupted. When I start the program from a terminal and interrupt it using CTRL-C, the .gz file is also always corrupted. When I start the program a terminal and terminate it using Linux killall, the .gz file is fine.

On the other hand, on a terminal cat <large_file> | gzip > cat.gz can be interrupted using CTRL-C and cat.gz is always fine. So I suspect cat has a signal handler of some sort that I have to implement as well in my program in C++... but looking at a cat implementation online, I found nothing like it. Never the less, I implemented this:

void SignalHandler(int aSignum)
{
  exit(0);
}

void Signals()
{
  signal(SIGINT,  SignalHandler);
  signal(SIGKILL, SignalHandler);
  signal(SIGTERM, SignalHandler);
}

...and even something in the bsh script, but nothing helps. After CTRL-C, the gz file is corrupted.

Questions:

  • What does cat have what my program does not?
  • How can I terminate my script/program using CTRL-C with the zip file in order?

Edit 1

Opening the resulting file using zcat gives some output, but then: gzip: file.gz: unexpected end of file. Opening it in Ubuntu's Archive Manager just gives a popup saying An error occurred while extracting files.

Edit 2

Tried flushing; no change in the problem was observed.

Edit 3

More info about the issue: Missing end (EOCDR) signature

Fix archive (-F) - assume mostly intact archive
    zip warning: bad archive - missing end signature
    zip warning: (If downloaded, was binary mode used?  If not, the
    zip warning:  archive may be scrambled and not recoverable)
    zip warning: Can't use -F to fix (try -FF)

zip error: Zip file structure invalid (file.gz)
maot@HP-Pavilion-dv7:~/temp$ zip -FF file.gz --out file2.gz
Fix archive (-FF) - salvage what can
    zip warning: Missing end (EOCDR) signature - either this archive
                     is not readable or the end is damaged
Is this a single-disk archive?  (y/n): y
  Assuming single-disk archive
Scanning for entries...
    zip warning: zip file empty
maot@HP-Pavilion-dv7:~/temp$ ls -lh file2.gz
-rw------- 1 maot maot 22 feb 15 15:18 file2.gz
maot@HP-Pavilion-dv7:~/temp$ 

Edit 4

Thanks @Maxim Egorushkin, but it does not work. The interruption of the script by CTRL-C kills prog before the signal handler of the script is executed. Hence, I can not send it a signal, it's already gone... and without output of SignalHandler. When prog is started from the command line, the output of SignalHandler is observed. Prog:

#include <iostream>
#include <unistd.h>
#include <csignal>

void SignalHandler(int aSignum)
{
  std::cout << "prog: Interrupt signal " << aSignum << " received.\n";
  fflush(nullptr);
  exit(0);
}

int main()
{
  for (int sig = 1; sig <=31; sig++)
  {
    std::cout << " sig " << sig;
    signal(sig,  SignalHandler);
  }

  while (true)
  {
    std::cout << "prog: Sleep ";
    fflush(nullptr);
    usleep(1e4);
  }
}

Script:

#!/bin/sh

onerror()
{
  echo "onerror(): Started."
  ps -jef | grep prog
  killall -s SIGINT prog
  exit
}

g++ -Wall prog.cpp -o prog

trap onerror 2

prog | gzip > file.gz

Result:

maot@HP-Pavilion-dv7:~/temp$ test.sh 
^Conerror(): Started.
maot     16733 16721 16721  5781  0 16:17 pts/1    00:00:00 grep prog
prog: no process found
maot@HP-Pavilion-dv7:~/temp$ 

Edit 5 minimal working solution

Implementation of the answer of Maxim Egorushkin. Script:

#!/bin/sh
g++ -Wall prog.cpp -o prog
prog | setsid gzip > file.gz & wait

Prog:

#include <iostream>
#include <unistd.h>
#include <csignal>

void SignalHandler(int aSignum)
{
  std::cout << "prog: Interrupt signal " << aSignum << " received.\n";
  exit(0);
}

int main()
{
  signal(SIGINT,  SignalHandler);

  while (true)
  {
    std::cout << "prog: Sleep ";
    usleep(1e4);
  }
}
Questioner
TradingDerivatives.eu
Viewed
16
Maxim Egorushkin 2020-02-17 01:17

When you press Ctrl+C the shell sends SIGINT to the last process in the pipeline, which is gzip here. gzip terminates and the next time prog writes into stdout it receives SIGPIPE.

You need to send SIGINT to prog for it to flush its stdout and exit (provided you installed the signal handler as you did), so that gzip receives all of its output and then terminates.


You can run your pipeline as follows:

prog | setsid gzip > file.gz & wait

It uses shell job control feature to start the pipeline in the background (that & symbol). Then it waits for the job to terminate. On Ctrl+C SIGINT is sent to the foreground process which is the shell in wait and all processes in the same terminal process group (unlike when the pipeline is in the foreground and SIGINT is sent only to the last process in the pipeline). prog is in that group. But gzip is started with setsid to place it into another group, so that it doesn't receive SIGINT but rather terminates when its stdin is closed when prog terminated.