Reacting with Remediation after a Service Goes Down
A customer last week asked me to implement the following use case:
When a production PLC device (programmable logic controller) goes offline, any associated service must be automatically switched off, since it needs to connect to the PLC in order to exchange data from the source server, and for that the PLC must be active. When the PLC device becomes active again, the service must be reactivated immediately so that it can continue exchanging data between the server and the PLC.
This is a typical case where event commands can be very useful (you can read the official documentation for event commands here: Monitoring Basics – Icinga 2).
To start off I created a ping service that allows me to enter the target PLC’s IP address.
Then I created my new command event like this:
object EventCommand "Command_name_powershell_stop_start" { import "plugin-event-command" command = [ "C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe", "-ExecutionPolicy", "ByPass" ] timeout = 1m arguments += { "(no key)" = { skip_key = true value = "; exit ($$lastexitcode)" } "-command" = { description = "description = Check PLC service state and check_attempt. If (service.state = CRITICAL AND service.state_type != SOFT), then stop the service SERVICE_NAME; if (service.state = OK), then start the service SERVICE_NAME" order = -2 skip_key = true value = "try { if ('$service.state$' -eq 'CRITICAL' -And '$service.state_type$' -ne 'SOFT') { Get-Service -Name 'SERVICE_NAME'| Stop-Service; exit 0; } if ('$service.state$' -eq 'OK') { Get-Service -Name 'SERVICE_NAME| Start-Service; exit 0; }} catch { echo $$_.Exception; exit 3; };" } } }
At this point we take the service and relate it to the event command. Here’s a preview of our service:
To avoid any doubts, I’d also advise checking the logs on the client where the command has to run in order to verify that there aren’t any errors. Pay attention to special characters like ‘’.
Once that’s done and you’ve verified that the deployment has succeeded, you can test it by using the check result to make the check think it didn’t succeed (after you’re sure that it’s passive) and that the service is sent into a HARD state causing it to trigger the event command.
Then you can proceed to the ping check which resets everything, or else just wait for the next check execution. When I compare the history of the two services (the ping to the PLC and the main service), I see that the times are similar.
I think this use case will soon be added to our NEP collection.
Enjoy automatic remediation!
These Solutions are Engineered by Humans
Did you find this article interesting? Does it match your skill set? Our customers often present us with problems that need customized solutions. In fact, we’re currently hiring for roles just like this and others here at Würth Phoenix.
Hi, I’m Franco and I was born in Monza. For 20 years I worked for IBM in various roles. I started as a customer service representative (help desk operator), then I was promoted to Windows expert. In 2004 I changed again and was promoted to consultant, business analyst, then Java developer, and finally technical support and system integrator for Enterprise Content Management (FileNet). Several years ago I became fascinated by the Open Source world, the GNU\Linux operating system, and security in general. So for 4 years during my free time I studied security systems and computer networks in order to extend my knowledge. I came across several open source technologies including the Elastic stack (formerly ELK), and started to explore them and other similar ones like Grafana, Greylog, Snort, Grok, etc. I like to script in Python, too. Then I started to work in Würth Phoenix like consultant. Two years ago I moved with my family in Berlin to work for a startup in fintech(Nuri), but the startup went bankrupt due to insolvency. No problem, Berlin offered many other opportunities and I started working for Helios IT Service as an infrastructure monitoring expert with Icinga and Elastic, but after another year I preferred to return to Italy for various reasons that we can go into in person 🙂 In my free time I continue to dedicate myself to my family(especially my daughter) and I like walking, reading, dancing and making pizza for friends and relatives.
Author
Franco Federico
Hi, I’m Franco and I was born in Monza. For 20 years I worked for IBM in various roles. I started as a customer service representative (help desk operator), then I was promoted to Windows expert. In 2004 I changed again and was promoted to consultant, business analyst, then Java developer, and finally technical support and system integrator for Enterprise Content Management (FileNet). Several years ago I became fascinated by the Open Source world, the GNU\Linux operating system, and security in general. So for 4 years during my free time I studied security systems and computer networks in order to extend my knowledge. I came across several open source technologies including the Elastic stack (formerly ELK), and started to explore them and other similar ones like Grafana, Greylog, Snort, Grok, etc. I like to script in Python, too. Then I started to work in Würth Phoenix like consultant. Two years ago I moved with my family in Berlin to work for a startup in fintech(Nuri), but the startup went bankrupt due to insolvency. No problem, Berlin offered many other opportunities and I started working for Helios IT Service as an infrastructure monitoring expert with Icinga and Elastic, but after another year I preferred to return to Italy for various reasons that we can go into in person :) In my free time I continue to dedicate myself to my family(especially my daughter) and I like walking, reading, dancing and making pizza for friends and relatives.
With the latest version of NetEye 4.33, the Fleet Server and ElasticAgent officially join the NetEye Elastic Stack (see NetEye 4.33 Release Notes ) Related to this new big feature, within the NetEye Extension Packs project we have provided new Read More
After a first semester full of great opportunities (many colleagues like to say so), the wheels have started turning again for NetEye Extension Packs. Now, a big evolution is beginning: in the near future, NEP will encompass the entire NetEye Read More
After performing several migrations to NetEye 4, I realized that not all checks present on the old NetEye 3 could be migrated immediately. Sometimes for obsolete host systems on which the new Icinga 2 Agent could not be installed, or Read More
With the end of 2021 we've release the first version of the NetEye Extension Packs project that helps customers and consultants on their monitoring implementations and more (see Introducing NetEye Extension Packs | www.neteye-blog.com for details). Now our focus is Read More
During my experience as a Würth Phoenix consultant, I've seen a pretty long list of broken Icinga 2 configurations. Several times, customers have begun a scheduled meeting with something like "Hey mister consultant, ever since the last deploy some objects Read More