Safe use of unix signals with multiprocessing module in python
""
Dear reader, since you are interested in this blog post, I am assuming
you are familiar with signal
module, which lets you catch unix signals
from within your python script. I am also assuming you are an enthusiast
parallel processing fan boy like me and try to make the best use of python’s
parallel processsing frame work - multiprocessing
in day to day
programming. :)
""
The conflict
signal
is a great module! multiprocessing
is too. But together they
behave aginst each other if you are not careful about their interaction.
If you ever tried to use both of them in a single script, you know what
I mean.
In a long running multiprocessing application, safe and clean exit
of the child processes is a must. There are a lot of ways to achieve this.
One common way is to send an Event
object to the child process. The
child will occassionally check whether that exit event is set or not and
then exit if necessary. Here’s a demo script implementing this pattern,
where the parent process feeds some data to a child worker process -
#!/usr/bin/env python3
import time
import queue
import signal
from multiprocessing import Process, Event, Queue
# The worker is intentionally too much lazy!
def lazy_ass_worker(exit_event, work_queue):
while not exit_event.is_set():
try: work = work_queue.get(timeout=1.0)
except queue.Empty: continue
print("I did job {} already! :)".format(work))
print("A small nap won't hurt anyone!")
time.sleep(1.0)
print("Doing cleanup before leaving ...")
exit_event = Event()
work_queue = Queue()
# Spawn the worker process.
cp= Process(target=lazy_ass_worker, args=(exit_event, work_queue),)
cp.start()
# Send some integers to the worker process.
for x in range(100):
work_queue.put(x)
# We wait for CTRL+C from the user.
try: signal.pause()
except KeyboardInterrupt:
# Since our worker is too delicate, we should notify it with the
# exit event and then wait for it's safe arraival / joining.
exit_event.set()
cp.join()
At the end of the script, I naively tried to catch KeyboardInterrupt
exception(which is generated by default as a response to SIGINT
signal)
in the parent process and then tried to notify the child about exit condition
so that it can do all the clean up before exiting. But if you actually run
the above script, and press CTRL+C
during the execution, the line
Doing cleanup before leaving …
is never printed. Here’s what happens
in my computer -
oscar@notebook ~ % python3 demo.py
I did job 0 already! :)
A small nap won't hurt anyone!
I did job 1 already! :)
A small nap won't hurt anyone!
^CProcess Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.4/multiprocessing/process.py", line 254, in _bootstrap
self.run()
File "/usr/lib/python3.4/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "demo.py", line 16, in lazy_ass_worker
time.sleep(1.0)
KeyboardInterrupt
oscar@notebook ~ %
What just happened?
Signals are propagated down the process tree. - that is what has
happened! Even if you catch a signal in the parent process, child processes
still receive and handle that signal. This comes in conflict with the kind of
pattern we used in the above demo, where synchronization primitives from
multiprocessing
module is used for safe cleanup.
The workaround
Every child process spawned by the multiprocessing module inherits signal
handlers from the parent process. If we set the signal handlers to SIG_IGN
for our target signals before spawning new processes, the child processes
will ignore the signals. With this strategy, our demo needs some minor
modifications -
# Save a reference to the original signal handler for SIGINT.
default_handler = signal.getsignal(signal.SIGINT)
# Set signal handling of SIGINT to ignore mode.
signal.signal(signal.SIGINT, signal.SIG_IGN)
exit_event = Event()
work_queue = Queue()
# Spawn the worker process.
cp= Process(target=lazy_ass_worker, args=(exit_event, work_queue),)
cp.start()
# Since we spawned all the necessary processes already,
# restore default signal handling for the parent process.
signal.signal(signal.SIGINT, default_handler)
In the above code, after all the necessary process spawnings are done, the
default signal handler is restored. If you use custom signal handlers, they
should be defined at this stage. Note that using some facilities of
multiprocessing
module, such as Queue
, Manager
etc implicitly spawns
additional processes. They should be be taken care of in a similar manner.
Caveats
Blocking important termination signals like SIGINT
, SIGTERM
etc. in the
child process is problematic at early stages of development.
Programming errors or runtime errors in the code can leave your
development system dirty with lots of zombie processes. In that case,
just kill them with a SIGKILL
, as it can not be ignored.