sawyl | A curious performance problem

I've been working on a curious problem. I've got an MPI program that makes a whole load of OS system calls. In some cases, it scales perfectly and the elapsed time remains constant as the problem size increases; but in other cases, the elapsed time increases linearly as the problem size increases.

What makes the problem interesting is that there don't seem to be any obvious differences between the cases that scale and the cases that don't. The underlying platform seems to play some role — there are hints that the problem may occur repeatably on particularly machines — but the configuration and OS of each platform are, apparently, identical across. So it seems as though I'm going to have to brute force the problem, running a large number of very short test cases using a series of different environments to see if I can locate a common factor.

What fun.