How to use anomaly detection to create smarter alerts
Alarms and monitoring go hand in hand. Whenever an algorithm or threshold is used to decide whether the current value of a registered KPI should rise an alarm or not the result can be a hit, a correct reject, a miss or a false alarm.
The standard way to rise alarms is studying standard traffic – which should not rise alarms – and deciding on a static threshold based on the historic standard traffic (For example see Figure 1) and experience. Everything below the threshold is than considered as standard traffic and everything above rises an alarm. This kind of threshold-based alarm creation is robust to many outliers and might be sufficient if the mean of the standard traffic does not change dynamically (in that case the threshold needs to be adapted dynamically, too). Signals might contain also anomalies that are quite useful for problem detection that look very different from classic (more or less extreme) outliers. For example a change in the distribution or similar (see Figure 2, red area on the right) can be a first sign of instability and taking an immediate counter-action can prevent the anomaly turning into a real problem.
For this reason the study of alternative more sophisticated alerting mechanisms is a useful addition to current common practice. By being able to differentiate between different types of anomalies and also detecting those that could not have been found by traditional methods one gets a step forward when monitoring KPIs from more and more complex network traffic or performance counters. Würth Phoenix is currently putting effort into the analysis of methods coming from the field of statistics and machine learning for alarm generation to guaranty a sound alarm quality to their customers.
For example methods based on signal decomposition (see above) where the signal is first split into a trend, seasonal components that repeat periodically, and the close study of the residual activity have already shown promising preliminary results (see below).
How can such a more sophisticated analysis of the traffic help to create smarter alarms?
Especially when configuring “unknown” systems such as new applications or networks from new customers that your solution is expected to monitor it is not always easy to learn what standard behaviour should look like. You need time to build your experience, automatic recognition based on anomaly detection “only” needs data (obviously also here the alarm quality can be expected to improve with historical data).
Especially the combination of standard and traditional methods is quite interesting. Goal here is for example to use anomaly detection to filter the most relevant from the alarms that have been detected by traditional methods to avoid false alarms, as a first step into a promising direction.
Hi there! My name is Susanne and I joined Würth-Phoenix early in 2015. Ever since I can remember computers and the perfection that can be reached by them have been very fascinating for me. I built my first personal PC using components from about 20 broken ones at the age of 11 and fell in love with open source, visualization and data analysis shortly afterwards. I hold a master in experimental physics (University of Erlangen, Germany) and a PhD in computer science (Universtiy of Trento, Italy) my main interests are machine learning, visualization techniques, statistics and optimization. As long as an algorithm of mine runs at night and I get new interesting results the morning after I am able to sleep well. Beside computers I also like music, inline skating, and skiing.
Author
Susanne Greiner
Hi there! My name is Susanne and I joined Würth-Phoenix early in 2015. Ever since I can remember computers and the perfection that can be reached by them have been very fascinating for me. I built my first personal PC using components from about 20 broken ones at the age of 11 and fell in love with open source, visualization and data analysis shortly afterwards. I hold a master in experimental physics (University of Erlangen, Germany) and a PhD in computer science (Universtiy of Trento, Italy) my main interests are machine learning, visualization techniques, statistics and optimization. As long as an algorithm of mine runs at night and I get new interesting results the morning after I am able to sleep well. Beside computers I also like music, inline skating, and skiing.
(Or, Who's Monitoring the Monitor?) Everyone uses a monitoring system to understand what's going on in their own environment and how it performs, but what about the monitoring system itself? The monitoring system also has its own tasks to perform, Read More
Some time ago I was able to use the machine learning functionality in Elastic for the first time. I was astonished at how easy it is to use, and how fast it calculates historical data. In my particular case, I Read More
In one of my previous posts I described how to create a customized NetEye 3 Tactical Monitoring Overview dashboard with Grafana and Telegraf. Here I’d like to show you how you can create a similar dashboard (much faster) for NetEye 4. Read More
If you've ever had cardinality problems with InfluxDB, you might have discovered that a tag is the likely culprit. Well done! But now your next step is getting rid of that tag, and that's when you'll realize that in a Read More
So you have MSSQL databases and you'd like to keep an eye on the performance of your DB. Using NetEye this is quite easy. The tools you need are already available on your NetEye server: InfluxDB, the Telegraf agent, and Read More