sawyl: (Default)
Having worked late last yesterday to create my OS images with a view to distributing them overnight, I got in this morning to discover a handful of failures. I tracked these to a couple of missing NIM resources, recreated the objects and reran the fan-out. Everything appeared to work, so I flashed the firmware — the point of no-return — and rebooted, only for the machines to come up with the previous OS instance.

After digging through the output, I noticed an apparently innocuous message. When I investigated more closely, it became clear that the innocent little message was actually a sign that the fan-out had failed, although there were no indications as to why it should have done so. Frustrated and out of time, I decided to ponder things over the weekend and come back to them fresh on Monday.

ETA: The problem turned out to be trival: the gateway for the machines' install network was missing from the xCAT networks database table. When I added this in and told xCAT to update the NIM information for the machines, the nodes switched over to the latest version of the OS without any trouble. I'm not entirely sure why the problem started happening on Friday because, according to the database backup logs, the gatway was removed from the networks table back in February...
sawyl: (Default)
Defaults can be dangerous and options that can be safely assumed in one context can be lethal in another. For example, most of the xCAT commands support a -t option that can be used to specify the type of target object, e.g. groups, nodes, etc. If the type is left unspecified, it assumes that the target is a node. A sensible assumption for harmless query commands like lsdef, but a foolish one potentially damaging commands like rmdef.

Thus, after creating a series of dynamic groups in a failed attempt to work around the lack of sensible booleans in the node selection criteria, I found myself with a series of groups I no longer needed. Naively, I tried to delete one of them using rmdef. But because I'd failed to specify the -t group the command removed the contents of the group, i.e. my nodes, instead of the group itself. Luckily I noticed the problem almost immediately because the number of deleted objects was larger than expected, but I still faced the boring and slightly embarrassing task of recreating the deleted objects (because, of course, the daily database backups hadn't been run for some weeks).

The behaviour of chdef is clearly wrong, given the way that it violates the principle of least surprise. I know, for it was certainly a surprise to me.

If the user is trying to delete something and there is insufficient information on the command line to determine unambiguously the target of the operation, the command should fail until enough arguments have been have be specified to allow the target to be deleted. It should not, pace xCAT, attempt to guess at the target by applying default option values, even when a this sort of guess-work identification is applied to other similar but non-destructive commands.

Profile

sawyl: (Default)
sawyl

August 2018

S M T W T F S
   123 4
5 6 7 8910 11
12131415161718
192021222324 25
262728293031 

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated May. 10th, 2026 01:18 am
Powered by Dreamwidth Studios