Oct. 1st, 2009

sawyl: (Default)
Theres something to be said for supercomputers that use commodity interconnects. Not least, that they generally come with a set of decent online diagnostic tools.

The Cray T3E, with it's custom interconnect, would crash with R-option errors at the hint of corrupt data passing through the torus, and the only method for diagnosing the source of the problem involved churning through a complete system dump using crashmk. The NEC, while substantially better in reliability terms, was often able to ride out interconnect faults provided they occurred on the node side of the connection, also suffered a from a complete lack of online interconnect diagnostics — it was impossible to tell whether error were occurring or what the traffic flow was across the system.

Thus, it's rather a nice change to have a system based on a commodity network — infiniband — which that comes complete error and traffic counters on the switches, the ability to report both bad tokens being passed across the network and failed CRC checks on the data packets. It's also nice to have a system that has an interconnect that uses completely independent planes, allowing components of the system to fail without causing all the nodes to crash.

It's all most unsupercomputerish.

Profile

sawyl: (Default)
sawyl

August 2018

S M T W T F S
   123 4
5 6 7 8910 11
12131415161718
192021222324 25
262728293031 

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 15th, 2025 03:37 am
Powered by Dreamwidth Studios