sawyl: (Default)
[personal profile] sawyl
After playing around with my job accounting interface for a week or so, my test users came back with a few suggested improvements. Most were easy to implement, but as I was testing the ability to execute queries across multiple data sets, I noticed that the performance wasn't all that it might be, with ten day queries taking 5 seconds to execute. Clearly something needed to be done.

The profiler confirmed my suspicions — the code was extremely IO bound and a lot time was being lost to the re.split() used to split each line of input into fields. My initial workaround for the problem was simply to run the input data set through tr -s " " to remove duplicate spaces, allowing me to replace my slow RE with a much faster string split.

Then it occurred to me that, having committed myself to preprocessing the input data, I might as well push the idea to its logical conclusion. Thus, I reprocessed all the input data, stripping out the unwanted fields and replacing them with things that had previously been dynamically created at run-time by the interface. Having done this, I decided to optimise the IO path by dumping the data out using the python marshal module, removing the need to do any processing on the input beyond a simple load.

When I considered the access patterns of the data, another idea suggested itself. Most of the queries were being made against the same core data sets, either because multiple users were querying information for the same day or because a single user was sorting the results of their previous query into a different order. So, given that mod_python makes it possible for data structures to persist in memory — one of the principle reasons it out-performs simple CGI — it occurred to me that I might buffer recently used data sets in memory. So I added a brief bit of wrapper code around the marshal.load() call and, presto, the performance of the most common subset of queries ran through twenty times faster.

All in all, a most satisfying way to spend an otherwise idle afternoon...

Date: 2008-08-13 11:39 am (UTC)
From: [identity profile] doctor-squale.livejournal.com
I still haven't found a way to port the marshal module of Python to the SX.

Hopefully, the world does not care.

Date: 2008-08-13 08:07 pm (UTC)
From: [identity profile] sawyl.livejournal.com
I've always found that tools like python and perl, with their heritage of dynamic linking, never really work all that well on machines that only support static linking. Even when you can get the wretched things to build — a non-trivial feat in itself — you often a find taht decent chunk of the most useful features have been disabled because of library problems...

Profile

sawyl: (Default)
sawyl

August 2018

S M T W T F S
   123 4
5 6 7 8910 11
12131415161718
192021222324 25
262728293031 

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 4th, 2026 08:50 pm
Powered by Dreamwidth Studios