First steps with Nagios XI
Jun. 27th, 2017 08:25 pmAn unexpectly productive day after one of my colleagues finished provisioning the new Nagios XI server and passed it over for configuration. After investigating the bulk import feature, I decided it would probably make more sense to recreate the hosts, groups, and services using the database.
Fortunately, thanks to the way the systems — with the exception of the Lustre appliance, which we didn't configure — were templated, it was a relatively simple task get everything defined and within a couple of hours, I had all the TDS monitoring up and running. Which, because the systems are almost indentical, means that all I need to do to duplicate the monitoring on the production systems is to duplicate the host entries. And while the are a lot of production nodes, it shouldn't be too formidable: we can either use the new ability to clone systems or possibly even create a custom wizard to set them up.
The process of mirroring the definitions over from the Lustre appliances — they run Icinga v1 under the covers — is likely to require more effort, but at the same time, it's also less urgent because the storage is already being monitored and because it requires me to re-write the code used to push alerts on to the XI server.
Fortunately, thanks to the way the systems — with the exception of the Lustre appliance, which we didn't configure — were templated, it was a relatively simple task get everything defined and within a couple of hours, I had all the TDS monitoring up and running. Which, because the systems are almost indentical, means that all I need to do to duplicate the monitoring on the production systems is to duplicate the host entries. And while the are a lot of production nodes, it shouldn't be too formidable: we can either use the new ability to clone systems or possibly even create a custom wizard to set them up.
The process of mirroring the definitions over from the Lustre appliances — they run Icinga v1 under the covers — is likely to require more effort, but at the same time, it's also less urgent because the storage is already being monitored and because it requires me to re-write the code used to push alerts on to the XI server.