sawyl | Lumping versus splitting

Got caught in a conversational maelstrom with a colleague this morning; one of those discussions where you can't quite manage to create a semi-polite opportunity to leave, despite trying everything from ostentatiously checking the time to standing up as if you're about to walk away. But despite that, a couple of interesting topics came up during the course of our chat, the most interesting of which concerned the degree to which programs should be functionalised.

I'm a splitter and prefer a functional model whereby all non-trivial code is split out into functions, even where the routine is only ever called once. This, it seems to me, can be justified on a number of grounds. Firstly, moving sections of code out into functions, leads to relatively narrow name spaces minimising variable clashes and making scoping explicit. Secondly, it makes it easy to apply common troubleshooting tools. It's easier to tell a debugger to stop at the start of a particular function than it is to add a break point at a specific line; it means exceptions and errors generate meaningful backtraces, making it much easier to track down errors; it makes it possible to profile the performance of particular subsections of the code, rather than simply learning that 100 per cent of the time has been spent in the program's main function. Thirdly and most obviously, it makes it easy to re-use the code if you realise part way through that the function you'd assumed would only do one job can actually be generalised to solve a whole range of problems.

Against this, the person I was talking was a lumper who advocated what someone once called pachinko code — a single giant function with minimal looping and as little branching as possible. Their argument was that in a lot of cases, it didn't make sense to use functions for code that is only called once because it means you have to jump back and forth in the script to trace the flow of execution and adds unnecessary complexity. This is almost certainly valid if the code is trivial, but as soon as the it becomes anything beyond a few screens worth, it becomes counterproductive. Further, it's easy to start out with something that you think is trivial and will only be used for a single purpose, only to discover that it needs to be extended and that the code to cover the additional cases will either cause the main routine to bloat out to the point where it becomes unmanageable and requires refactorting to bring it back under control.

Not only do our different approaches reflect fundamentally different approaches — something that mirrors our hedgehogish and foxish attitudes — but I think they may also be due to the amount of programming that we do — I do a lot, my colleague doesn't — and our general levels of laziness — I feel its best to Do Things Right from the outset, putting in more work early on to avoid the tedium of having to redo things later when the specifications change.