sawyl | Tuning a mod_python script

After playing around with my job accounting interface for a week or so, my test users came back with a few suggested improvements. Most were easy to implement, but as I was testing the ability to execute queries across multiple data sets, I noticed that the performance wasn't all that it might be, with ten day queries taking 5 seconds to execute. Clearly something needed to be done.

The profiler confirmed my suspicions — the code was extremely IO bound and a lot time was being lost to the re.split() used to split each line of input into fields. My initial workaround for the problem was simply to run the input data set through tr -s " " to remove duplicate spaces, allowing me to replace my slow RE with a much faster string split.

Then it occurred to me that, having committed myself to preprocessing the input data, I might as well push the idea to its logical conclusion. Thus, I reprocessed all the input data, stripping out the unwanted fields and replacing them with things that had previously been dynamically created at run-time by the interface. Having done this, I decided to optimise the IO path by dumping the data out using the python marshal module, removing the need to do any processing on the input beyond a simple load.

When I considered the access patterns of the data, another idea suggested itself. Most of the queries were being made against the same core data sets, either because multiple users were querying information for the same day or because a single user was sorting the results of their previous query into a different order. So, given that mod_python makes it possible for data structures to persist in memory — one of the principle reasons it out-performs simple CGI — it occurred to me that I might buffer recently used data sets in memory. So I added a brief bit of wrapper code around the marshal.load() call and, presto, the performance of the most common subset of queries ran through twenty times faster.

All in all, a most satisfying way to spend an otherwise idle afternoon...

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Most Popular Tags

aix - 44 uses
bach - 49 uses
book reviews - 400 uses
books - 342 uses
bouldering - 76 uses
bricolage - 37 uses
climbing - 198 uses
comics - 82 uses
crankiness - 42 uses
crosswords - 42 uses
cycling - 43 uses
dr who - 42 uses
exeter - 74 uses
exhaustion - 53 uses
family - 200 uses
fashion - 55 uses
fear - 84 uses
food - 168 uses
geeks with guns - 40 uses
guardian - 223 uses
hacking - 147 uses
hpc - 96 uses
illnesses - 60 uses
irreverence - 35 uses
movies - 46 uses
music - 420 uses
opera - 32 uses
parkrun - 89 uses
philosophy - 105 uses
politics - 82 uses
proms - 68 uses
python - 38 uses
quotes - 81 uses
radio - 156 uses
religion - 46 uses
reviews - 146 uses
running - 149 uses
sailing - 31 uses
scepticism - 33 uses
science - 86 uses
solstice - 66 uses
stupidity - 49 uses
sysadmin - 33 uses
travel - 168 uses
tv - 58 uses
videos - 171 uses
weather - 93 uses
weirdness - 103 uses
words - 44 uses
work - 202 uses

Flat | Top-Level Comments Only

From:

doctor-squale.livejournal.com

I still haven't found a way to port the marshal module of Python to the SX.

Hopefully, the world does not care.

sawyl.livejournal.com

I've always found that tools like python and perl, with their heritage of dynamic linking, never really work all that well on machines that only support static linking. Even when you can get the wretched things to build — a non-trivial feat in itself — you often a find taht decent chunk of the most useful features have been disabled because of library problems...

Tales of a Fourth Grade Nothing

Tuning a mod_python script

Tuning a mod_python script

no subject

no subject

Profile

August 2018

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags