04. 03. 2025 Alessandro Valentini Automation, DevOps, Service Management

Group-aware Reboot with Ansible

Use Case

During NetEye Cloud updates we typically have to handle 25+ nodes, updating both OS and Firmware and subsequently rebooting all servers, all without causing downtime.

We can of course reboot one node a time, but this would be really time-consuming. The main constraints on reboot are PCS nodes and Elastic layers. In particular we must:

Reboot at most one PCS node a time to avoid losing the quorum
Not reboot more than one Elastic node per layer: in NetEye Cloud we have Elastic-only nodes dedicated to Hot, Warm and Cold data

PCS nodes are also dedicated to Hot data, so we can just handle them as plain Elastic nodes. We also have other nodes dedicated to InfluxDB which can be handled in parallel with Elastic-only nodes and PCS, so we organize the inventory accordingly:

all:
  children:
    pcs:
       pc1.ne.cloud
       pc2.ne.cloud

    hot:
      hosts:
        eh1.ne.cloud:
        eh2.ne.cloud:
        eh3.ne.cloud:
      children:
        pcs:

    warm:
      hosts:
        ew1.ne.cloud:
        ew2.ne.cloud:
        ew3.ne.cloud:
        ew4.ne.cloud:

    cold:
      hosts:
        ec1.ne.cloud:
        ec2.ne.cloud:

    influx:
      id1.ne.cloud
      id2.ne.cloud

At this point the solution is pretty straightforward: pick one node from each group (except pcs, which is a child of hot) and reboot it… however, Ansible doesn’t have a method for doing that!

After some investigation we spotted two possible solutions:

Use list zipping to build the desired inventory: we used this solution initially, but with more than 2 groups it becomes really complex to understand and to debug
Create a new inventory with add_host, which is the way we’ll do it in this blog-post

Implementation

To achieve our goal we need to use two playbooks.

Creating the dynamic inventory

The first playbook builds the inventory and runs on localhost:

- hosts: localhost
  gather_facts: false
  vars:
    _inventory_groups:
      - hot
      - warm
      - cold
      - influx_do

    _groups_len_max: "{{ _inventory_groups | map('extract', groups) | map('length') | max }}"
    _hosts_index_list: "{{ range(_groups_len_max | int) }}"
    _parallelizable_grouped_hosts: "{{ query('cartesian', _hosts_index_list, _inventory_groups) }}"

  tasks:
    - name: Inventory | Create inventory with parallelizable groups
      ansible.builtin.add_host:
        name: "{{ groups[item.1][item.0] | default('dummy-' ~ ansible_loop.index) }}"
        groups: parallelizable_group
      loop: "{{ _parallelizable_grouped_hosts }}"
      loop_control:
        extended: true

    - name: Inventory | Calculate the length of _inventory_groups
      ansible.builtin.set_fact:
        inventory_groups_length: "{{ _inventory_groups | length }}"
      run_once: true

Here we have several variables that do most of the work:

_inventory_groups is a list of groups which can be safely handled in parallel
_groups_len_max picks the biggest groups among those in _inventory_groups
_hosts_index_list is a simple list of indices from 0 to _groups_len_max
_parallelizable_grouped_hosts is a Cartesian product of groups and indices

The add_host task picks a host from each group, or a dummy host in the case no further host is available in a group, and adds it to parallelizable_group generating a list like the following:

[eh1, ew1, ec1, id1, eh2, ew2, ec2, id2, eh3, ew3, dummy11, dummy12, pc1, dummy13, dummy14, dummy15, pc2, dummy16, dummy17, dummy18]

Finally, inventory_groups_length just contains the number of parallellizable groups in order to automatically set the parallelism level in the next playbook.

Run the action in parallel among groups

In the same file we need a second playbook to actually run the tasks in parallel on one machine for each group:

- hosts: parallelizable_group
  serial: "{{ hostvars.localhost.inventory_groups_length }}"
  order: inventory
  gather_facts: false
  tasks:
    - name: Update real hosts
      block:

        ...

      when: inventory_hostname is not match('^dummy-\d*$')

In this second playbook we run the tasks in the block using the hosts in parallelizable_group created by the first play.

The parallelization trick is done by serial, set to the number of groups stored earlier, and the order is enforced by order: inventory. In this way we can pick groups of 4 hosts (because we want to run in parallel one host each of the hot, warm, cold and influx_do groups), and we can safely assume that the list is ordered.

The result will be the following, where at each round 4 hosts (including dummies) will be handled:

Round 1	eh1	ew1	ec1	id1
Round 2	eh2	ew2	ec2	id2
Round 3	eh3	ew3	dummy11	dummy12
Round 4	pc1	dummy13	dummy14	dummy15
Round 5	pc2	dummy16	dummy17	dummy18

Caveat

Since not all groups have same size, we need to add dummy hosts in order to ensure that we will never pick more than one machine belonging to the same group at the same time. Without that, Round 3 would be:

Round 3

eh3

ew3

pc1

pc2

which would cause 3 hot nodes to reboot at the same moment, which isn’t what we want.

Since those dummy machines don’t exist, we have to skip all tasks for dummy hosts. This is true also for gather_facts which must be set to false and executed inside the block with a dedicated task:

- name: Prepare | Collect facts
  ansible.builtin.setup:
    gather_subset:
      - "all"

Conclusions

The solution described above is probably not too elegant and for sure it’s a bit complex to read. Anyway for the moment I haven’t found a better solution to achieve this kind of selective parallelization.

It’s important to note that the same solution applies to all use cases in which we have a redundant workload distributed across multiple machines without an orchestration solution.

These Solutions are Engineered by Humans

Did you find this article interesting? Are you an “under the hood” kind of person? We’re really big on automation and we’re always looking for people in a similar vein to fill roles like this one as well as other roles here at Würth Phoenix.