How to Monitor a Complex Veeam-based Backup System
Veeam is a widely used and well-known backup system.
A customer recently asked me if he could check on the operation of his Veeam-based backup system by verifying the Windows event log, since the standard checks used within the community did not provide him with the current status of his Veeam infrastructure.
This use case requires that the customer be able to check whether over the last X hours there was a specific event with a particular event ID and message. Receiving this event allows the customer to be sure that the backup was successful!
If this event is not present in the last X hours, then the backup is taking too long and the customer would like to be informed of that fact with a critical message, as he has to manually adjust the system in order to prevent any data corruption or overloading.
Design
To implement this use case we decided to use a plugin made available by Icinga called icinga-powershell-plugins, whose documentation can be found on this page.
Among the various plugins there, we found it useful to implement our use case with Invoke-IcingaCheckEventlog whose documentation can be found at this link.
In order to use it we installed the Icinga agent and then the Powershell plugins, using the following commands after starting a Powershell session with administrative rights.
Once that’s done, we were ready to use our Windows client-side script.
I ran the command by hand to check which options to use, then after setting the command I imported the newly available basket so we could use it.
We then create the host, configure the service linked to the command in question called Invoke-IcingaCheckEventlog, and get a result.
Event Monitoring
We looked at that command’s documentation where we saw that event 190 can be monitored. Furthermore, we also verified that the name of the job is inside the message, so you can go and check whether in the last 24 hours there were any instances of event 190 containing the name of the job we scheduled. If there is no such event, it means that the job is running long and therefore we need to manually intervene, and so for this reason we can raise an alarm.
The standard command code doesn’t allow an alarm to be raised when 0 events are found. So we then proceeded to change the PowerShell code to allow for this capability.
We set the various variables as needed before we test the command from the PowerShell command line. Here’s an example command we used:
Invoke-IcingaCheckEventlog -IncludeEventId 190 -Verbosity 3 -IncludeMessage "*'NAME OF THE JOB'*’STATUS OF THE JOB’*" -After 24h -critical 1:
I can also check the actual presence of the backup files using icinga-powershell-check-directory, which allows me to check whether or not the backup file is present.
Again we set the various variables as needed before we test the command in PowerShell. Here’s an example command:
In the end we’ll have two commands along with the service templates that can use them on the Veeam host, and we can therefore satisfy our customer’s use case: monitoring whether in the last X hours a Veeam job has run correctly and has produced a file of at least X megabytes contained in a particular path.
Hi, I’m Franco and I was born in Monza. For 20 years I worked for IBM in various roles. I started as a customer service representative (help desk operator), then I was promoted to Windows expert. In 2004 I changed again and was promoted to consultant, business analyst, then Java developer, and finally technical support and system integrator for Enterprise Content Management (FileNet). Several years ago I became fascinated by the Open Source world, the GNU\Linux operating system, and security in general. So for 4 years during my free time I studied security systems and computer networks in order to extend my knowledge. I came across several open source technologies including the Elastic stack (formerly ELK), and started to explore them and other similar ones like Grafana, Greylog, Snort, Grok, etc. I like to script in Python, too. Then I started to work in Würth Phoenix like consultant. Two years ago I moved with my family in Berlin to work for a startup in fintech(Nuri), but the startup went bankrupt due to insolvency. No problem, Berlin offered many other opportunities and I started working for Helios IT Service as an infrastructure monitoring expert with Icinga and Elastic, but after another year I preferred to return to Italy for various reasons that we can go into in person 🙂 In my free time I continue to dedicate myself to my family(especially my daughter) and I like walking, reading, dancing and making pizza for friends and relatives.
Author
Franco Federico
Hi, I’m Franco and I was born in Monza. For 20 years I worked for IBM in various roles. I started as a customer service representative (help desk operator), then I was promoted to Windows expert. In 2004 I changed again and was promoted to consultant, business analyst, then Java developer, and finally technical support and system integrator for Enterprise Content Management (FileNet). Several years ago I became fascinated by the Open Source world, the GNU\Linux operating system, and security in general. So for 4 years during my free time I studied security systems and computer networks in order to extend my knowledge. I came across several open source technologies including the Elastic stack (formerly ELK), and started to explore them and other similar ones like Grafana, Greylog, Snort, Grok, etc. I like to script in Python, too. Then I started to work in Würth Phoenix like consultant. Two years ago I moved with my family in Berlin to work for a startup in fintech(Nuri), but the startup went bankrupt due to insolvency. No problem, Berlin offered many other opportunities and I started working for Helios IT Service as an infrastructure monitoring expert with Icinga and Elastic, but after another year I preferred to return to Italy for various reasons that we can go into in person :) In my free time I continue to dedicate myself to my family(especially my daughter) and I like walking, reading, dancing and making pizza for friends and relatives.
Hello everyone! As you may remember, a topic I like to discuss a lot on this blog is the Proof of Concept (POC) about how we could enhance search within our online NetEye User Guide. Well, we're happy to share Read More
In the ever-evolving landscape of IT monitoring and management, the ability to efficiently handle multi-dimensional namespaces is crucial. Within NetEye, Log-SIEM (Elastic), provides a comprehensive solution for managing the single namespace dimension with the namespace of a data_stream. This blog Read More
Hey everyone! We played around a bit last time with our radar data to build a model that we could train outside Elasticsearch, loading it through Eland and then applying it using an ingest pipeline. But since our data is Read More
Right now, at Würth Phoenix, we are investing in automating most of our operations using Ansible. You're probably already familiar with what Ansible does, but to summarize, Ansible is an open-source, command-line IT automation application written in Python. I've talked Read More
OpenShift already has a built-in monitoring suite with Prometheus, Grafana, and Alertmanager. This is all well and good, but what if organizations want to monitor their entire infrastructure, integrating all monitoring results under one umbrella? In this case, it's necessary Read More