30. 04. 2024 Franco Federico Unified Monitoring

Reacting with Remediation after a Service Goes Down

A customer last week asked me to implement the following use case:

When a production PLC device (programmable logic controller) goes offline, any associated service must be automatically switched off, since it needs to connect to the PLC in order to exchange data from the source server, and for that the PLC must be active. When the PLC device becomes active again, the service must be reactivated immediately so that it can continue exchanging data between the server and the PLC.

This is a typical case where event commands can be very useful (you can read the official documentation for event commands here: Monitoring Basics – Icinga 2).

To start off I created a ping service that allows me to enter the target PLC’s IP address.

Then I created my new command event like this:

object EventCommand "Command_name_powershell_stop_start" {
    import "plugin-event-command"
    command = [
        "C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe",
        "-ExecutionPolicy",
        "ByPass"
    ]
    timeout = 1m
    arguments += {
        "(no key)" = {
            skip_key = true
            value = "; exit ($$lastexitcode)"
        }
        "-command" = {
            description = "description = Check PLC service state and check_attempt. If (service.state = CRITICAL AND service.state_type != SOFT), then stop the service SERVICE_NAME; if (service.state = OK), then start the service SERVICE_NAME"
            order = -2
            skip_key = true
            value = "try { if ('$service.state$' -eq 'CRITICAL' -And '$service.state_type$' -ne 'SOFT') { Get-Service -Name 'SERVICE_NAME'| Stop-Service; exit 0; } if ('$service.state$' -eq 'OK') { Get-Service -Name 'SERVICE_NAME| Start-Service; exit 0; }} catch { echo $$_.Exception; exit 3; };"
        }
    }
}

At this point we take the service and relate it to the event command. Here’s a preview of our service:

template Service "nx-st-agent-windows-ping-with-stop-SERVICE_NAME" {
    import "nx-st-agent-windows"
    check_command = "ping-windows"
    max_check_attempts = "3"
    check_interval = 5m
    retry_interval = 1m
    check_timeout = 1m
    enable_notifications = true
    enable_event_handler = true
    enable_flapping = true
    event_command = " Command_name_powershell_stop_start "
}

To avoid any doubts, I’d also advise checking the logs on the client where the command has to run in order to verify that there aren’t any errors. Pay attention to special characters like ‘’.

Once that’s done and you’ve verified that the deployment has succeeded, you can test it by using the check result to make the check think it didn’t succeed (after you’re sure that it’s passive) and that the service is sent into a HARD state causing it to trigger the event command.

Then you can proceed to the ping check which resets everything, or else just wait for the next check execution. When I compare the history of the two services (the ping to the PLC and the main service), I see that the times are similar.

I think this use case will soon be added to our NEP collection.

Enjoy automatic remediation!

These Solutions are Engineered by Humans

Did you find this article interesting? Does it match your skill set? Our customers often present us with problems that need customized solutions. In fact, we’re currently hiring for roles just like this and others here at Würth Phoenix.

Franco Federico

Hi, I’m Franco and I was born in Monza. For 20 years I worked for IBM in various roles. I started as a customer service representative (help desk operator), then I was promoted to Windows expert. In 2004 I changed again and was promoted to consultant, business analyst, then Java developer, and finally technical support and system integrator for Enterprise Content Management (FileNet). Several years ago I became fascinated by the Open Source world, the GNU\Linux operating system, and security in general. So for 4 years during my free time I studied security systems and computer networks in order to extend my knowledge. I came across several open source technologies including the Elastic stack (formerly ELK), and started to explore them and other similar ones like Grafana, Greylog, Snort, Grok, etc. I like to script in Python, too. Then I started to work in Würth Phoenix like consultant. Two years ago I moved with my family in Berlin to work for a startup in fintech(Nuri), but the startup went bankrupt due to insolvency. No problem, Berlin offered many other opportunities and I started working for Helios IT Service as an infrastructure monitoring expert with Icinga and Elastic, but after another year I preferred to return to Italy for various reasons that we can go into in person 🙂 In my free time I continue to dedicate myself to my family(especially my daughter) and I like walking, reading, dancing and making pizza for friends and relatives.

Author

Franco Federico

Hi, I’m Franco and I was born in Monza. For 20 years I worked for IBM in various roles. I started as a customer service representative (help desk operator), then I was promoted to Windows expert. In 2004 I changed again and was promoted to consultant, business analyst, then Java developer, and finally technical support and system integrator for Enterprise Content Management (FileNet). Several years ago I became fascinated by the Open Source world, the GNU\Linux operating system, and security in general. So for 4 years during my free time I studied security systems and computer networks in order to extend my knowledge. I came across several open source technologies including the Elastic stack (formerly ELK), and started to explore them and other similar ones like Grafana, Greylog, Snort, Grok, etc. I like to script in Python, too. Then I started to work in Würth Phoenix like consultant. Two years ago I moved with my family in Berlin to work for a startup in fintech(Nuri), but the startup went bankrupt due to insolvency. No problem, Berlin offered many other opportunities and I started working for Helios IT Service as an infrastructure monitoring expert with Icinga and Elastic, but after another year I preferred to return to Italy for various reasons that we can go into in person :) In my free time I continue to dedicate myself to my family(especially my daughter) and I like walking, reading, dancing and making pizza for friends and relatives.

Latest posts by Franco Federico

17. 02. 2025 Unified Monitoring

Monitoring Printer Logs

10. 12. 2024 Log-SIEM

Let’s Discover ES|QL

19. 11. 2024 NetEye

Now It’s Really Easy to Activate OTP for Your Personal NetEye Account

18. 10. 2024 DevOps

My Laptop Is Broken …. What Can I Do?

20. 08. 2024 APM, Log-SIEM, Unified Monitoring

A Journey through Elastic Integrations

See All