sawyl: (Default)
[personal profile] sawyl
My decision to reconfigure NIS has been comprehensively vindicated. I decided to up the number of slaves from a paltry one to six and to change the default bindings to distribute the load evenly across the servers.

When I moved the first cluster of systems off the master, I noticed that the parallel ypwhich I issued came back rather more swiftly than the first, but I assumed that it was load related and thought no more of it. When the same thing when I moved the second cluster, I decided run a couple of experiments to confirm my intuitive feeling that the things were running with greater celerity.

To establish a baseline, I ran a parallel ypwhich across 23 systems connected to the original NIS master. The systems were not under load — all the work had been checkpointed to allow for maintenance. The command took some 16 seconds to complete.

I then ran the command against 34 systems, split to use two different NIS slaves. Each system was running a full workload. The command took 0.6 seconds to complete — 26 times faster than the baseline case!

My test, which relies on logging in to systems in parallel and running a series of commands, is likely to be extremely sensitive to poor NIS latencies. But it also seems to me to be indicative of the sorts of delays a large scale parallel application might see when launching across a large number of nodes. In most cases, a 15–20 second delay would be neglible. In the case of a real time application, especially one that depends on running a number of large MPI executables within a single job, the cumulative start-up delays might well have a significant impact.

Profile

sawyl: (Default)
sawyl

August 2018

S M T W T F S
   123 4
5 6 7 8910 11
12131415161718
192021222324 25
262728293031 

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 6th, 2026 12:49 pm
Powered by Dreamwidth Studios