Warm tip: This article is reproduced from stackoverflow.com, please click
gcloud python

What is the easiest way to determine all forks are done in Python

发布于 2020-04-16 11:55:41

Sorry for the butchered title. I did not have enough space to ask the following question: Suppose you run a python program and fork the process using os.fork(). When you fork the process it's not very easy to determine when all the forks have been completed. I need to able to convey to the computer that all forks have been completed because I need to shut the computer down with sudo shutdown -h now. This is because I will be using gcloud which costs $5 an hour. If the process completes while I'm asleep I don't want to waste $35.

I tried using a global variable where after each fork was done, python would write down in a dictionary that that fork was in fact done. Even though the variable was a global I still could not get the values of that variable to be the same for each fork. So I think using global variables is out. My current work around that I am now thinking about is periodically reading the output of the top command in terminal. After a fork is completed the process should terminate and no longer show up in the top command. So once there is only one python process running that should signify that all forks are completed. It should take about 30 minutes to figure out how to do that, so I'd rather not use that method if someone already knows something easier.

I should also note that this will be done using a Google gcloud virtual instance. So if gcloud already has something set up where it can determine if a fork process has completed then please let me know.

Questioner
logic1976
Viewed
29
Amadan 2020-02-04 16:09

If you insist on using os.fork, I believe the best way is to use signal to trap SIGCHLD (the signal sent to the parent process whenever a child process terminates). Increment a counter every time you fork, decrement and check for zero every time you catch SIGCHLD. Since the variable is only in the main thread of the parent process, there is no synchronisation problems.

A better way, if it works for you, is to use the multiprocessing module instead of os.fork, as it will do a lot of bookkeeping for you (as well as solve a lot of other recurring problems, like interprocess communication).

EDIT: Here's a toy example for the os.fork case.

import os
import sys
import random
import signal
import time

def do_something(i):
    print(f"Process {i} starting")
    time.sleep(random.uniform(1, 10))
    print(f"Process {i} ending")
    sys.exit()

running = 0

for i in range(10):
    if os.fork():          # in parent fork (we got the child PID)
        running += 1
    else:                  # in child fork (we got zero)
        do_something(i)

def chld_handler(signo, frame):
    global running
    running -= 1
    print(f"A child died. {running} remain.")
    if not running:
        print("All dead. Exiting.")
        sys.exit()

signal.signal(signal.SIGCHLD, chld_handler)

while True: time.sleep(1) # loop forever