sawyl: (Default)
[personal profile] sawyl
Following an unexpected loss of power last week, one of the AIX systems has consistently failed to recognise its InfiniBand adapter. After trying the usual tricks to get the OS to reconfigure the card — removing and reinstalling fixes, deleting and re-creating the devices in the ODM — the problem was passed over to the hardware guys who replaced the card and found that it made no difference either way.

Today, the engineering bods finally found the source of the problem: the hardware configuration on the HMC was inconsistant. By creating a new partition with an explicit hardware configuration — the duff partition had simply been defined to use all available resources — and booting AIX, they were able to confirm that the system could talk to the card. Knowing this, they simply deleted and recreated the broken partition definition and, presto-chango, the system was back with all interfaces present and correct.

With the system back up, we then uncovered a whole series of problems with the routing configuration. Essentially, the static routes were correctly defined in the inet0 entries of CuAt but they had not been installed at boot time. We suspect that the problem may be due to a race condition in the network startup scripts — the routes are defined to a multi-link pseudo device, but if the route commands are run before the creation of the ml0 interface has completed, the routes aren't created — and it's interesting that we don't see the same problem on an otherwise identical system configured with a much larger number of IB cards. Although the problem is trivial to fix — simply running cfgmgr -l inet0 at the end of the boot does the trick — it's a nasty little gotcha in an area that shouldn't really require manual intervention to get going.

But, grousing aside, I think I've learnt more about AIX and GPFS troubleshooting in the last week than I picked up on a month's worth of courses, which is surely a good thing...
This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

Profile

sawyl: (Default)
sawyl

August 2018

S M T W T F S
   123 4
5 6 7 8910 11
12131415161718
192021222324 25
262728293031 

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 6th, 2026 06:16 am
Powered by Dreamwidth Studios