Recently (in September 2023) NetEye integrated version 8.8 of the Elastic Stack, which is just one of many Elastic updates brought into NetEye 4.
Since this Elastic update there was a major upgrade (from version 7.17) coming with many breaking changes, so we, as the NetEye R&D team, wanted to make this important upgrade as safe and smooth as possible for our users.
In particular, our main goal was to minimize the downtime across all components of the Elastic Stack, which should translate to no (or minimal) disservice for NetEye users, and less stress when fixing problems for the NetEye administrators that are performing the upgrade.
To achieve minimal downtime in the components of the Elastic Stack, a precise procedure must be followed as explained in the official Elastic documentation. Our problem was that this procedure could not be supported with the way NetEye 4 was handling updates until now, which can be summed up in 2 steps:
So why is this procedure not fine for the update of the Elastic Stack? Well, for 2 main reasons:
To overcome the problems we just discussed we decided to introduce the concept of “Configurators” of the components of NetEye. The configurator of a NetEye component is in charge of updating the packages of that component and configuring it, so that the update of each component is handled separately and each component can define the exact procedure of how it should be updated.
So we introduced an “Elastic Stack Configurator” (and later an “Icinga 2 Configurator”), which is an Ansible procedure that has complete control over how the Elastic Stack components must be updated and configured on the NetEye Nodes. This configurator is integrated in the NetEye Update and Upgrade commands by running them as first thing right after updating the NetEye repositories definition, as you can see below:
Writing the configurator in Ansible allows us to have a very simple way of defining how the configuration of the components must be performed on the various nodes, thanks to the declarative nature of the language and the very simple way it allows us to manage the multiple nodes of our NetEye Clusters.
At the moment this article was written, the configurator procedures are still being run sequentially, one after the other, which means that the usage of configurator does not improve the duration of updates/upgrades.
Nonetheless this was a first step in the path to making the whole update/upgrade procedure faster, because now we have isolated update procedures for each NetEye component, where previously they were all entangled together. This means that in order to make NetEye updates/upgrades faster, we are now able to run the configurators in parallel when there are no dependencies between components. But this an improvement that will be evaluated in the future, so stay tuned!
Did you find this article interesting? Does it match your skill set? Programming is at the heart of how we develop customized solutions. In fact, we’re currently hiring for roles just like this and others here at Würth Phoenix.