More than 100 Teams were competing, more than 25 sent in a solution, the best reaching a Macro-F1 scorse higher 0.88.
Last Friday, after six long weeks, the time had finally come. During ECML-PKDD conference at Riva del Garda the best of the competing approaches have been described and discussed. The participants had the possibility to get answers directly from the organizers and last but not least Iryna Haponchyk – leader of the winning team – was awarded 1000 Euro for the solution with the highest macro-F1 score, or better for having created a model capable of producing such a score. Here you can see the beaming winner during the discovery callenge prize ceremony.
Iryna explained her team trained a standard multi-class linear SVM classifier, having preliminarily enriched the presented attribute set with features generated using a random forest and features encoding the notion of interdependency between the examples that go close to each other in time.
Lei Xu, leader of the team placed second, handed in a solution based on an ensemble approach that consists of one main model, which performs a sequence of binary and multi-class classification to deal with the imbalance of target classes, and a couple of add-on ones, which further increase the accuracy of the work.
Martin Wistuba, leader of the team placed third, presented still a different solution. Their approach is based on state of the art methods for automatic machine learning hence no human intervention took place for preprocessing, feature engineering nor model parameter optimization.
So far so good, but what is the concrete achievement of the winners and are there any future prospects of the challenge outcomes?
Let me briefly summarize the task of the challenge:
Würth Phoenix S.rl. is continuously collecting network traffic data on request level. These data can be used for several tasks:
just to mention a few.
The task of the NetCla challenge was to train a model on part of the data of a single average working day to be able to predict the remaining data of the very same working day.
Each request is characterized by various metrics, e.g. latencies and throughput measured by the probe. The objective of the challenge was, given a transmission in the network, to predict the application that is transmitting the data (a single label, multi-class classification task).
So now we know that it is possible to predict the application responsible for the request with a high accuracy. What can this knowledge be used for? For example checks on potential changes within the network (as soon as new traffic gets notably harder to predict, there might be a reason for that). Methods based on machine learning can be expected to be more sensitive to such changes than common practice methods that are merely based on the expected signal mean.
The challenge data were only taken from a single working day. Analogous data from different days should be use for training and testing respectively to gain a first estimate for the day specificity of the methods.
Herewith Würth-Phoenix S.r.l. congratulates the NetCla challenge winners, thanking them for their enthusiastic participation, and wishing them all the best for their future.