sawyl | Tuning a python script

Fed up with the poor performance of one of my python scripts, I decided to spend half an hour or so optimising it, just to see if I couldn't improve things.

Sure enough, when I profiled the base version using a fixed configuration, I found that it was losing most of its time to a series of deepcopy() calls. I changed the internal data structures to use a dictionary instead of a more complex set of arrays and, presto, the run time went down from 7.55s to 1.76s and the function call count went down from 398,359 to 53,678.

But when I re-profiled, I found a couple of other nasties: an N by N array that was being needlessly regenerated for every loop trip and a whole host of calls to has_key(). After adding caching to the main loop to prevent recalculation and replacing all the has_keys() with try ... execpt KeyError ... constructs, I managed to get the time down to 0.97s and a mere 11,236 function calls.

And here's proof of my success: a plot showing how the raw and optimised versions of the program scale as the number of items increases:

So it looks like the big wins were, in order of effectiveness: (a) to use a more efficient algorithm and data structures; (b) to replace has_keys() with exceptions in order to avoid the overhead of the extra subroutine calls and to exploit the (relatively) fast performance of exception handling; (c) to avoid unnecessary work by caching results, trading memory for CPU. Of these, (a) and (c) were obvious, while (b) was only apparent after a little bit of experimentation with the profiler.

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Most Popular Tags

aix - 44 uses
bach - 49 uses
book reviews - 400 uses
books - 342 uses
bouldering - 76 uses
bricolage - 37 uses
climbing - 198 uses
comics - 82 uses
crankiness - 42 uses
crosswords - 42 uses
cycling - 43 uses
dr who - 42 uses
exeter - 74 uses
exhaustion - 53 uses
family - 200 uses
fashion - 55 uses
fear - 84 uses
food - 168 uses
geeks with guns - 40 uses
guardian - 223 uses
hacking - 147 uses
hpc - 96 uses
illnesses - 60 uses
irreverence - 35 uses
movies - 46 uses
music - 420 uses
opera - 32 uses
parkrun - 89 uses
philosophy - 105 uses
politics - 82 uses
proms - 68 uses
python - 38 uses
quotes - 81 uses
radio - 156 uses
religion - 46 uses
reviews - 146 uses
running - 149 uses
sailing - 31 uses
scepticism - 33 uses
science - 86 uses
solstice - 66 uses
stupidity - 49 uses
sysadmin - 33 uses
travel - 168 uses
tv - 58 uses
videos - 171 uses
weather - 93 uses
weirdness - 103 uses
words - 44 uses
work - 202 uses

Threaded | Top-Level Comments Only

From:

leonardo-m.livejournal.com

Generally "in" for dicts is faster than has_key() and it's the most pythonc (has_key is removed from Python3).

sawyl.livejournal.com

Thanks! I changed my code and, sure enough, it was both faster and clearer than the version using exceptions - a definite win.

I can't say I'll be sorry to see has_key() go. I only ever seem to use it on occasions when I'm forced to switch back and forth between python and perl, so it's probably a cargo cult carry over from perl...

Tales of a Fourth Grade Nothing

Tuning a python script

Tuning a python script

no subject

no subject

Profile

August 2018

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags