Upgrading a NetEye 4 installation, either as a cluster or a single instance, is not always a painless activity. NetEye 4 is, in fact, a very sophisticated product that offers its customers a very large number of features, and operates in complex and business-critical environments.
From version to version, the upgrade procedure may change: as an example, we can think of the recently released NetEye 4.12 which requires, in the case where the Logmanagement or SIEM feature modules are installed, to execute a series of manual steps to migrate from Search Guard to Elastic X-Pack security.
The NetEye 4 User Guide is always a good read before upgrading a NetEye installation. In fact, any extra steps required for a safe upgrade are very carefully explained on a dedicated page. But following any such steps is again a manual activity, which is very time-consuming and potentially subject to error.
Being able to perform automatic tasks or processes can make a difference in terms of saving time and money, and minimizing any potential manual intervention.
How can we automate the NetEye 4 upgrade procedure, or at least some parts of it?
I’ve mentioned many times that we really do love Ansible. Ansible is our everyday sidekick for automating regular and repetitive tasks, for provisioning and configuring NetEye 4 clusters, for deploying NetEye 4 VMs in Microsoft Azure, and for interacting with and configuring hosts and systems. Thus using Ansible for starting automating procedures within NetEye 4 was a logical choice.
To support the automation of the NetEye upgrade procedure, we extended the existing neteye
utility with a new command, upgrade
, which performs a set of checks to determine the current status of that NetEye 4 installation. If the system is healthy and ready to be upgraded, the command proceeds with the installation of the repository definition for the next available NetEye version. From this step on, the installation of new packages and the configuration of the new version must still be performed manually, but our vision is to go in the direction of providing users with a safe and sound fully automatic upgrade procedure.
Imagine that we have a NetEye 4.11 cluster consisting of 3 nodes. We read on the NetEye blog that the new NetEye 4.12 is now available, and we would like to upgrade our installation to the new version. Since the upgrade
command is brand new, let’s see what it can do for us and how.
The basic syntax of the command is pretty simple:
[root@neteye-cluster1 ~]# neteye upgrade
This will trigger the execution of an Ansible playbook that will perform the necessary set of checks and, if everything is okay, will install the new repository definitions for NetEye 4.12.
One potential problem here is that Ansible by default does not know what machines are part of the overall system. We cannot resort to static lists of machines and hosts: they would not respect the topology of our system, and they would require manual intervention to be manipulated in case of changes to the environment. How can we manage to produce a valid Ansible inventory that would correctly represent our system?
We solved the problem by using a dynamic inventory script, which allows Ansible to know on the fly which hosts to check and to configure, without the need to resort to static inventories.
Logically, the Ansible playbook executed by the neteye upgrade
command can be broken down into the following steps:
The upgrade
command retrieves the NetEye version installed and the installation status by calling the new NetEye agent, a daemon that supervises the NetEye installation (look here for more information about the agent).
Should there be any issues with the installation (for example, an installation that has not been correctly finalized), the upgrade
command stops and triggers an error.
Remember that the failure of a check will terminate the automatic upgrade procedure, and the
upgrade
command will fail until all errors are manually fixed.
If the check is successful, the command moves on to the next step.
The health status is verified by automatically executing the NetEye health checks on each node of the cluster. The upgrade
triggers the automatic execution of this command:
[root@neteye-clusterX ~]# neteye health deep
and collects the results for each node in the cluster.
If the health checks executed successfully, the health status of our cluster is considered adequate for performing the upgrade.
The neteye upgrade
command now verifies that all the patches and bug fixes have been installed on each node of our cluster, given the NetEye version (here, 4.11) that is currently installed. If updates are available, the execution stops and the user must install the missing packages before running the upgrade
command again.
As an example, let’s suppose that the latest version of packages httpd-neteye-config
and httpd-neteye-config-autosetup
(1.6.2) is not installed on one of the nodes of our cluster. In fact, on node neteye-cluster1
we have version 1.6.1 of the packages.
The output of the upgrade
command is shown below:
TASK [upgrade | verify if bugfixes and updates have been installed]
fatal: [neteye-cluster1]: FAILED! => {
"changed": false
}
MSG:
"Found updates not installed"
"Example: httpd-neteye-config-autosetup, version 1.6.2"
By looking at the output, we can see that the sanity check performed by the upgrade
command failed (see lines 1-3) on node neteye-cluster1
, as expected. The reason for the failure is explained in lines 5-7:
MSG:
"Found updates not installed"
"Example: httpd-neteye-config-autosetup, version 1.6.2"
The command has indeed identified that, for at least one package (httpd-neteye-config-autosetup
), there is a newer version (1.6.2) available in the NetEye repository. By installing the missing updates, the neteye upgrade
command can move on to the next check.
The command now verifies various properties of the NetEye cluster. It starts by disabling the fencing
property if enabled and then checks that all the cluster nodes are online. This is, in fact, a mandatory condition before proceeding with the upgrade procedure.
For example, it might be that we put one of the nodes in standby, and then forgot to bring it back online, as shown here:
[root@neteye-cluster1 ~]# pcs status nodes
Pacemaker Nodes:
Online: neteye-cluster1 neteye-cluster3
Standby: neteye-cluster2
Standby with resource(s) running:
Maintenance:
Offline:
As you can see, node neteye-cluster2
appears in standby, while the other nodes are correctly online.
If we run the neteye upgrade
command we obtain the following output:
TASK [upgrade | check NetEye nodes status (cluster only)]
failed: [neteye-cluster1] (item=neteye-cluster2) => {
"changed": false,
"item": "neteye-cluster2"
}
MSG:
NetEye cluster node neteye-cluster2 is not online
Also in this case, the command stops by triggering an error due to the fact that one of our cluster nodes is not online. We can manually fix this, run the command again, and move on to the next step.
Having passed all the sanity checks, our cluster is now ready to be upgraded. The upgrade
command checks for the existence of a new NetEye version available (4.12 in our running example) and performs the installation of the rpm package that contains the repository definitions for the new NetEye version.
Once the new repo definitions are installed, or else no new versions are found, the command concludes its execution. From here on, the upgrade procedure requires fully manual intervention.
The underlying logic of the neteye upgrade
command is written in Ansible. Thus Ansible command line options can be also passed to the command. For example, to increase the verbosity of the output of the upgrade
command, we can type:
[root@neteye-cluster1 ~]# neteye upgrade -v
Increasing the number of v‘s will further increase the verbosity level (i.e., -vvv).
Correct execution of the neteye upgrade
command will produce the following output:
PLAY RECAP ****************************************************************************************************************************************************************************************************************************
localhost : ok=1 changed=0 unreachable=0 failed=0
neteye-cluster1 : ok=14 changed=4 unreachable=0 failed=0
neteye-cluster2 : ok=10 changed=2 unreachable=0 failed=0
neteye-cluster3 : ok=10 changed=2 unreachable=0 failed=0
We can see a summary of the tasks operated, divided by host and by task outcome. For example, for the node neteye-cluster1
, we have 14 tasks whose outcome was ok, 4 with changed, and 0 with unreachable or failed.
To be able to proceed with the upgrade, be sure that you don’t see any unreachable or failed items in the PLAY RECAP section of the command output!