NetEye installation topology can fluctuate over time, with hosts of various types that can be, for example, added or removed from a cluster in response to changes in business demand or customer requirements.
In a cluster environment, hosts are manually mapped in a file called /etc/neteye-cluster
, a sort of static inventory solution that is the unique source of trust on the NetEye cluster topology, and which is synchronized across all the hosts belonging to the cluster itself.
An example of /etc/neteye-cluster
file is the following:
{
"Hostname" : "neteye-cluster.myneteye.cluster",
"Nodes" : [
{
"addr" : "192.168.47.1",
"hostname" : "neteye01.neteyelocal",
"hostname_ext" : "neteye-cluster01.myneteye.cluster",
"id" : 1
},
{
"addr" : "192.168.47.2",
"hostname" : "neteye02.neteyelocal",
"hostname_ext" : "neteye-cluster02.myneteye.cluster",
"id" : 2
}
],
"ElasticOnlyNodes": [
{
"addr" : "192.168.47.3",
"hostname" : "neteye03.neteyelocal",
"hostname_ext" : "neteye-cluster03.myneteye.cluster"
}
],
"VotingOnlyNode" : {
"addr" : "192.168.47.4",
"hostname" : "neteye04.neteyelocal",
"hostname_ext" : "neteye-cluster04.myneteye.cluster",
"id" : 3
}
}
As you can see, the file describes the NetEye cluster and the attributes of its hosts, dividing them by type. Available types are:
Nodes, which are standard NetEye nodes that form the backbone of the Red Hat cluster architecture of NetEye 4;
ElasticOnlyNodes, which are nodes that run Elasticsearch only and form the Elastic cluster;
VotingOnlyNode, which is a node that has the only purpose of providing a quorum for DRBD, PCS, and Elasticsearch.
As you can imagine, by default the neteye upgrade
or neteye update
commands do not know what hosts are part of the whole NetEye environment, so how can we identify which hosts to upgrade, distinguishing them according to their type?
We can resort to the /etc/neteye-cluster
file to map the hosts and their type to host groups, which can be used by our commands to manage the update/upgrade procedures.
Ansible, in fact, works against multiple nodes at the same time, using lists of nodes or groups of lists known as inventory. Then, we can use the so-called patterns to select hosts or host groups to run our commands against.
We use a dynamic inventory script to map hosts from the /etc/neteye-cluster
file to host groups, which is invoked at run time when calling the Ansible playbooks that perform the update/upgrade operations. The script can also be executed via command line in any NetEye installation:
[root@neteye01 ~]# python /usr/share/neteye/scripts/upgrade/dynamic-inventory.py --list | jq
The output of the command, considering our example, is the following:
{
"_meta": {
"hostvars": {
"neteye03.neteyelocal": {
"internal_node_addr": "192.168.47.3"
},
"neteye02.neteyelocal": {
"internal_node_addr": "192.168.47.2"
},
"neteye01.neteyelocal": {
"internal_node_addr": "192.168.47.1"
},
"neteye04.neteyelocal": {
"internal_node_addr": "192.168.47.4"
}
}
},
"all": {
"vars": {
"cluster": "true",
"ansible_ssh_user": "root"
}
},
"voting_nodes": {
"hosts": [
"neteye04.neteyelocal"
]
},
"nodes": {
"hosts": [
"neteye04.neteyelocal",
"neteye01.neteyelocal",
"neteye02.neteyelocal"
]
},
"es_nodes": {
"hosts": [
"neteye03.neteyelocal"
]
}
}
We can use the patterns in our dynamic inventory to target specific host groups to run ad-hoc commands on them.
For example, let’s imagine that we want to know which Linux distribution is currently equipping the ElasticOnly nodes of our NetEye cluster. We can write a very simple Ansible playbook, called es_only_nodes.yml
, like this one:
- hosts: es_nodes
any_errors_fatal: true
gather_facts: true
tasks:
- name: print the Linux distro on es nodes only
debug:
msg: "{{ ansible_distribution }}-{{ ansible_distribution_version }}"
You can notice from line 2 that we target a specific pattern, es_nodes, to execute the tasks in our playbook ElasticOnly nodes only.
In fact, if we execute the playbook, what we obtain is
[root@neteye01 ~]# ansible-playbook -i /usr/share/neteye/scripts/upgrade/dynamic-inventory.py es_node_only.yml
PLAY [es_nodes] ***********************************************************************************************************************************************************************************************************************
TASK [Gathering Facts] ****************************************************************************************************************************************************************************************************************
ok: [neteye03.neteyelocal]
TASK [print the Linux distro on es nodes only] ****************************************************************************************************************************************************************************************
ok: [neteye03.neteyelocal] => {
"msg": "CentOS-7.9"
}
PLAY RECAP ****************************************************************************************************************************************************************************************************************************
neteye03.neteyelocal : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
As you can see, the playbook was run against neteye03.neteyelocal
, our only ElasticOnly node, avoiding the other nodes.
The neteye upgrade
command, in particular, uses the very same mechanism to operate on the NetEye cluster during the upgrade procedure. We use patterns like nodes,!voting_nodes,!es_nodes
to identify which hosts we can run PCS
on.
In our example, the pattern nodes
identifies hosts
[neteye04.neteyelocal,neteye01.neteyelocal,neteye02.neteyelocal]
but we also state that we want to explicitly exclude patterns voting_nodes
and es_nodes
(pay attention to the ! in front of those two patterns) from playbook execution.
This means that the resulting list of hosts is composed of all the hosts in nodes
except those in voting_nodes
and es_nodes
. Since neteye04.neteyelocal is also a member of voting_nodes
, it is excluded from the execution of the playbook, which is limited to [neteye01.neteyelocal,neteye02.neteyelocal].
Imagine having a NetEye cluster as in the following /etc/neteye-cluster
file:
{
"Hostname" : "neteye-cluster.myneteye.cluster",
"Nodes" : [
{
"addr" : "192.168.47.1",
"hostname" : "neteye01.neteyelocal",
"hostname_ext" : "neteye-cluster01.myneteye.cluster",
"id" : 1
},
{
"addr" : "192.168.47.2",
"hostname" : "neteye02.neteyelocal",
"hostname_ext" : "neteye-cluster02.myneteye.cluster",
"id" : 2
},
{
"addr" : "192.168.47.3",
"hostname" : "neteye03.neteyelocal",
"hostname_ext" : "neteye-cluster03.myneteye.cluster",
"id": 3
}
]
}
For some reason, node neteye01.neteyelocal must be kept on standby to avoid allocating cluster resources in it, with the other two nodes sharing all the workload.
During the upgrade to NetEye 4.20, however, all nodes are put on standby by the neteye upgrade
command, thus disrupting regular NetEye operation.
Why?
This can happen because of the peculiarity of this cluster, which comes with the first node of the host group nodes
on standby. The neteye upgrade
command, in fact, elects that very first node of that host group as an always active node during the upgrade procedures and tries to put the other nodes on standby.
To solve this issue, you can reorder your nodes in /etc/neteye-cluster
as follows:
{
"Hostname" : "neteye-cluster.myneteye.cluster",
"Nodes" : [
{
"addr" : "192.168.47.2",
"hostname" : "neteye02.neteyelocal",
"hostname_ext" : "neteye-cluster02.myneteye.cluster",
"id" : 2
},
{
"addr" : "192.168.47.3",
"hostname" : "neteye03.neteyelocal",
"hostname_ext" : "neteye-cluster03.myneteye.cluster",
"id": 3
},
{
"addr" : "192.168.47.1",
"hostname" : "neteye01.neteyelocal",
"hostname_ext" : "neteye-cluster01.myneteye.cluster",
"id" : 1
}
]
}
This way, the neteye upgrade
command will elect node neteye02.neteyelocal as always active, preserving the functionalities of the NetEye cluster during the upgrade.