Farewell brute force and ignorance
Mar. 22nd, 2012 09:33 pmWinnowing through logs in search of useful information, I've had a bit of a brainwave: instead of running a linear search through a file looking and ripping out the sections I want with
ETA: In the process of coding up, I noticed another neat hack. If, in the case where I'm winnowing through a specific log of date-stamped entries, I replace my custom parser with
sed pattern matches, I can take advantage of the ordered nature of the file and use a binary search. This should let me massively reduce the number of calls — log2(N) versus N — to the computationally expensive regular expression engine and should let me extract the interesting sections reasonably quickly without having to resort to threading (although I reserve the right to go parallel if impatience demands it).ETA: In the process of coding up, I noticed another neat hack. If, in the case where I'm winnowing through a specific log of date-stamped entries, I replace my custom parser with
dateutil.parser, not only does the cost drop by 10-20 times but the code becomes sufficiently general to work on any file where the first part of the line contains a log entry, e.g. both syslog and LoadLeveler. Excelsior!