sawyl | Jun. 8th, 2011

Having been asked to re-run an extremely slow piece of data analysis, I decided that I couldn't face waiting a week for my data and decided instead to parallelise my program. After investigating and rejecting python threads on the grounds of GIL contention, I eventually decided to use the multiprocessing module to distribute the work over a number of different OS processes, using a pair of Queue objects to assign work to each of the threads and to combine the results in the main process.

The structure of the final program was something a bit like this:

( Source code... )

Along the way I discovered a couple of potential gotchas. I found that unless I emptied both Queue data structures before I attempted to join() the worker tasks, the program would deadlock. I also found that I needed to explicitly count the number of remaining results in order to determine whether the output queue was empty, because otherwise I'd have needed to use a timeout that was longer than longest possible elapsed time of each unit of work — something that would have had a significant impact on the over all run time of the program.

The eventual performance of the end program was extremely satisfactory. I managed to parallelise the code in about a quarter of the elapsed time of shortest serial analysis and, because the scalability of the program was almost linear, I got a 60x improvement on a single run and 200x improvement on the reanalysis of all my data sets.

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Tales of a Fourth Grade Nothing

Jun. 8th, 2011

Jun. 8th, 2011

Parallelising python programs

Profile

August 2018

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags