threads vs. processes

It's been a few weeks since my last post. My excuse is that I've been busy - my job is always busy around this time of year due to the IBC trade show. Thankfully most of the work is now done. In my own time I've been working on a number of projects (to be unveiled shortly, once they're usable).

In the last few weeks we've seen the launch of the google chrome browser. I won't discuss it here directly (plenty of other people have reviewed it separately). I will, however mention the one feature that has converted me away from firefox and towards chrome:

Google Chrome has a multi-process architecture, meaning tabs can run in separate processes from each other, and from the main browser process.


What this means in theory is that a misbehaving website cannot bring down the entire browser session - you can just close / kill the offending tab / window.

This is pretty cool, but how does it relate to my own work?

I'm writing an application that makes extensive input of python scripts to customize the input and output of the application. I could have spend a while embedding the python scripts using the python API, and then running those separate scripts in a thread, but instead I chose to run them as separate processes spawned by a central application. The way I see it, this approach has several advantages:
  • Crashing child processes (in this case python scripts) are unlikely to bring down the entire application - you still need to be careful since there's likely to be communication between child and parent process, and the parent needs to be able to hand corrupted data in that communication. Other than that, if one of my scripts breaks, I can easily carry on; no error handling or cleanup needs to be done.
  • In my application, I need to repeatedly call the script files - probably thousands of times for a single application run. In a traditional threaded environment the possibility for memory leaks is huge. This way, since each sub-process is short lived, the operating system takes charge of cleaning up any memory not deallocated by the child process.
  • It's a damn site easier to program too. No thread synchronization required!

It's not all roses however.


From my experiences the main drawback is that it's much harder to pass data between parent and child processes. For simple communication it may be enough to use stdin and stdout, but for anything more complicated you'll want to use some form of proper IPC (sockets, named pipes, DBus etc). Even with proper IPC it's still harder to pass custom data structures between processes, since you'll need some sort of data serialization.

Think this all sounds obvious?

That's because it really is! I can't believe how much simpler this approach is, especially if your requirements are for simple delegation to scripts or sub-processes. If you require more complex interaction you might find this approach more trouble than it's worth.