Tornado is a CEP “Complex Event Processor” that receives reports of events from data sources such as monitoring and email, matches them against preconfigured rules, and executes the actions associated with those rules. Some vendors provide static notification systems that cannot be customized. For example, during one project we were faced with a tool that only sends notifications to NetEye via email in HTML format.
REGEXes can provide the required parsing flexibility, but some “special” delimiters can prove to be a source of problems. So Tornado should be able to manage an email like this one:
{"body":"\n<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\n<html>\n<head>\n <title>Mail Title</title>\n</head>\n<body bgcolor='#eceded'>\n\t
...
<span class=\"row\">Field1</span></td>\n\t\t\t<td><span class=\"row\">17,20 MWh</span>
..."}
The value “Field1” is inside a SPAN and should be extracted using a REGEX like:
<span class="row">Field1</span></td>\n\t\t\t<td><span class="row">(.+?)\s
But this syntax is not properly managed by RUST engine in Tornado because the \n and \t are not correctly interpreted, and the parsing of this part \n\t\t\t fails. In order to bypass this issue we use hexadecimal notation for the match filter and replace these fields with:
The final result is a Match filter able to manage these special character and provide us with the correct value:
The REGEX used to extract this value is:
<span class="row">Field1</span></td>\x0a\s+<td><span class="row">(.+?)\s
So by combining Tornado and a REGEX with hexadecimal we can also parse and extract fields from HTML emails with special characters.
Did you like this article? Does it reflect your skills? We often get interesting questions straight from our customers who need customized solutions. In fact, we’re currently hiring for roles just like this and other roles here at Würth Phoenix.