During NetEye Cloud updates we typically have to handle 25+ nodes, updating both OS and Firmware and subsequently rebooting all servers, all without causing downtime.
We can of course reboot one node a time, but this would be really time-consuming. The main constraints on reboot are PCS nodes and Elastic layers. In particular we must:
PCS nodes are also dedicated to Hot data, so we can just handle them as plain Elastic nodes. We also have other nodes dedicated to InfluxDB which can be handled in parallel with Elastic-only nodes and PCS, so we organize the inventory accordingly:
all:
children:
pcs:
pc1.ne.cloud
pc2.ne.cloud
hot:
hosts:
eh1.ne.cloud:
eh2.ne.cloud:
eh3.ne.cloud:
children:
pcs:
warm:
hosts:
ew1.ne.cloud:
ew2.ne.cloud:
ew3.ne.cloud:
ew4.ne.cloud:
cold:
hosts:
ec1.ne.cloud:
ec2.ne.cloud:
influx:
id1.ne.cloud
id2.ne.cloud
At this point the solution is pretty straightforward: pick one node from each group (except pcs
, which is a child of hot
) and reboot it… however, Ansible doesn’t have a method for doing that!
After some investigation we spotted two possible solutions:
add_host
, which is the way we’ll do it in this blog-postTo achieve our goal we need to use two playbooks.
The first playbook builds the inventory and runs on localhost:
- hosts: localhost
gather_facts: false
vars:
_inventory_groups:
- hot
- warm
- cold
- influx_do
_groups_len_max: "{{ _inventory_groups | map('extract', groups) | map('length') | max }}"
_hosts_index_list: "{{ range(_groups_len_max | int) }}"
_parallelizable_grouped_hosts: "{{ query('cartesian', _hosts_index_list, _inventory_groups) }}"
tasks:
- name: Inventory | Create inventory with parallelizable groups
ansible.builtin.add_host:
name: "{{ groups[item.1][item.0] | default('dummy-' ~ ansible_loop.index) }}"
groups: parallelizable_group
loop: "{{ _parallelizable_grouped_hosts }}"
loop_control:
extended: true
- name: Inventory | Calculate the length of _inventory_groups
ansible.builtin.set_fact:
inventory_groups_length: "{{ _inventory_groups | length }}"
run_once: true
Here we have several variables that do most of the work:
_inventory_groups
is a list of groups which can be safely handled in parallel_groups_len_max
picks the biggest groups among those in _inventory_groups
_hosts_index_list
is a simple list of indices from 0 to _groups_len_max
_parallelizable_grouped_hosts
is a Cartesian product of groups and indicesThe add_host
task picks a host from each group, or a dummy host in the case no further host is available in a group, and adds it to parallelizable_group
generating a list like the following:
[eh1, ew1, ec1, id1, eh2, ew2, ec2, id2, eh3, ew3, dummy11, dummy12, pc1, dummy13, dummy14, dummy15, pc2, dummy16, dummy17, dummy18]
Finally, inventory_groups_length
just contains the number of parallellizable groups in order to automatically set the parallelism level in the next playbook.
In the same file we need a second playbook to actually run the tasks in parallel on one machine for each group:
- hosts: parallelizable_group
serial: "{{ hostvars.localhost.inventory_groups_length }}"
order: inventory
gather_facts: false
tasks:
- name: Update real hosts
block:
...
when: inventory_hostname is not match('^dummy-\d*$')
In this second playbook we run the tasks in the block using the hosts in parallelizable_group
created by the first play.
The parallelization trick is done by serial
, set to the number of groups stored earlier, and the order is enforced by order: inventory
. In this way we can pick groups of 4 hosts (because we want to run in parallel one host each of the hot, warm, cold and influx_do groups), and we can safely assume that the list is ordered.
The result will be the following, where at each round 4 hosts (including dummies) will be handled:
Round 1 | eh1 | ew1 | ec1 | id1 |
Round 2 | eh2 | ew2 | ec2 | id2 |
Round 3 | eh3 | ew3 | dummy11 | dummy12 |
Round 4 | pc1 | dummy13 | dummy14 | dummy15 |
Round 5 | pc2 | dummy16 | dummy17 | dummy18 |
Since not all groups have same size, we need to add dummy hosts in order to ensure that we will never pick more than one machine belonging to the same group at the same time. Without that, Round 3 would be:
Round 3 | eh3 | ew3 | pc1 | pc2 |
which would cause 3 hot nodes to reboot at the same moment, which isn’t what we want.
Since those dummy machines don’t exist, we have to skip all tasks for dummy hosts. This is true also for gather_facts
which must be set to false and executed inside the block with a dedicated task:
- name: Prepare | Collect facts
ansible.builtin.setup:
gather_subset:
- "all"
The solution described above is probably not too elegant and for sure it’s a bit complex to read. Anyway for the moment I haven’t found a better solution to achieve this kind of selective parallelization.
It’s important to note that the same solution applies to all use cases in which we have a redundant workload distributed across multiple machines without an orchestration solution.
Did you find this article interesting? Are you an “under the hood” kind of person? We’re really big on automation and we’re always looking for people in a similar vein to fill roles like this one as well as other roles here at Würth Phoenix.