Yet another bump in the road...
Aug. 10th, 2012 04:46 pmHaving worked late last yesterday to create my OS images with a view to distributing them overnight, I got in this morning to discover a handful of failures. I tracked these to a couple of missing NIM resources, recreated the objects and reran the fan-out. Everything appeared to work, so I flashed the firmware — the point of no-return — and rebooted, only for the machines to come up with the previous OS instance.
After digging through the output, I noticed an apparently innocuous message. When I investigated more closely, it became clear that the innocent little message was actually a sign that the fan-out had failed, although there were no indications as to why it should have done so. Frustrated and out of time, I decided to ponder things over the weekend and come back to them fresh on Monday.
ETA: The problem turned out to be trival: the gateway for the machines' install network was missing from the xCAT
After digging through the output, I noticed an apparently innocuous message. When I investigated more closely, it became clear that the innocent little message was actually a sign that the fan-out had failed, although there were no indications as to why it should have done so. Frustrated and out of time, I decided to ponder things over the weekend and come back to them fresh on Monday.
ETA: The problem turned out to be trival: the gateway for the machines' install network was missing from the xCAT
networks
database table. When I added this in and told xCAT to update the NIM information for the machines, the nodes switched over to the latest version of the OS without any trouble. I'm not entirely sure why the problem started happening on Friday because, according to the database backup logs, the gatway was removed from the networks
table back in February...