30. 08. 2024 Daniel Degasperi Blue Team, SEC4U

A Concrete Example of ES|QL and SOC Detection Rules

The purpose of this article is to show a real-life case study of the integration of the new Elastic ES|QL language within the detection rules used by the SOC to detect cyber threats.

Overview

ES|QL (Elasticsearch Query Language) is an SQL-like query language developed by Elastic specifically for querying time series and event data stored in Elasticsearch. It’s designed to make it easier to query structured data using a syntax that’s familiar to anyone who’s used SQL (https://www.elastic.co/guide/en/elasticsearch/reference/current/esql.html).

Here’s the query structure:

SELECT field1, field2
FROM index_name
WHERE condition
ORDER BY field1 DESC
LIMIT 10

The case proposed

In this blog post I’d like to show the remarkable improvement in one particular case using the ES|QL language.

Suppose we want to detect the creation of a considerable number of files on a Windows system over a very short period of time. This could certainly indicate very normal, routine activity on a system (e.g., updates/upgrades), but there’s also the (not so remote) possibility you’re detecting a mass encryption action, a typical behavior of ransomware.

The idea is therefore to define a rule that triggers an alert when the total number of files created on the same machine by the same process exceeds a certain threshold.

The Threshold Rules

Before the introduction of the new ES|QL language, if we wanted to create a rule like that based on a threshold, we would’ve had to use rules of type Threshold.

Here’s a simplified basic example:

event.category:file 
and event.action: ("file created (rule: filecreate)" or "creation" or "filecreate" or "overwrite") 

Results aggregated by process.name >= 3000

The concept of the threshold rules is very simple and easy to understand. It’s necessary to define which fields to aggregate, and a useful threshold.

There is, however, one strong limitation. Any exceptions to the rule are only possible by considering the fields over which they are to be aggregated, so whitelisting certain patterns rather than certain locations particularly prone to file creation by legitimate processes is not possible.

Fields such as file.path are not taken into account.

Let’s Try Using ES|QL

The great advantage of ES|QL in these cases in addition to the possibility of aggregating, enriching data, creating sub-queries, and so on, is the possibility of considering for exceptions fields that are not present within the query, but only need to be present in the analyzed logs.

Below is the ES|QL rule I used to detect the creation of large amounts of data, thus potentially the detection of encryption by ransomware.

FROM logs-endpoint.events.file-*, winlogbeat-*-sysmon-*, logs-windows.sysmon_operational-*
| EVAL file_entropy = COALESCE(file.Ext.entropy, 6.0)
| EVAL proc_trusted = COALESCE(process.code_signature.trusted, false)
| EVAL proc_signature = COALESCE(process.code_signature.subject_name, "notsigned")
| WHERE event.action IN ("file created (rule: filecreate)", "creation", "filecreate", "overwrite") 
and event.category == "file"
// filter for high values of entropy
and file_entropy >= 6.0
// ignore some trusted and signed software 
and not (proc_trusted==true and proc_signature in ("Microsoft Corporation" ,"Microsoft Windows" ,"Microsoft Windows Publisher", "Microsoft Dynamic Code Publisher", "Adobe Inc.", "Adobe Systems, Incorporated") and (process.name != "powershell.exe" or process.name != "cmd.exe"))
//count the number of created files
| STATS file_count = COUNT(file.name) by host.hostname, process.name, event.action, event.category
| WHERE file_count > 3000
| KEEP host.hostname, process.name, event.action, event.category, file_count

By analyzing the query we can see how the two discriminating factors on which we base filtering or considering the evidence are entropy and the signature of the process.

For entropy, we consider values greater than or equal to 6 (on a scale of 8), while for the signature the idea is to exclude processes signed by well-known and widespread software houses.

In this case we still need to consider trusted processes like PowerShell and cmd, as they could be used to run the actual malware.

But if the Logs You’re Considering Don’t Have the Necessary Fields?

It’s fundamental to take into account that the data received is collected by several integrations, and not all of them provide fields such as entropy, hash or a process signature.

The EVAL pattern used in combination with the COALESCE command allows us to define default values to be taken into account in case the Elastic logs do not provide them.

So, in this specific case, in order to not lose any evidence, entropy is set to 6 for those logs that don’t have the required fields and the process is considered not trusted.

About Entropy

Entropy in the context of information theory refers to the measure of uncertainty or randomness in a set of data. It quantifies the unpredictability of information content, where higher entropy means more randomness, and lower entropy indicates more predictability.

Essentially, higher entropy makes encryption more robust by ensuring that the key or data is sufficiently random and unpredictable.

These are the values considered by Elastic in the EDR rule:

https://github.com/elastic/protections-artifacts/blob/main/ransomware/artifact.lua

Certainly, taking entropy into account is not a foolproof method, but it’s necessary to introduce some filters for such rules, otherwise they would be obscured by continuous noise.

On the other hand, two ransomware simulators, one provided by Elastic (https://github.com/elastic/protections-artifacts/tree/main/ransomware/testing) and QuickBuck (https://github.com/NextronSystems/ransomware-simulator) encrypt data with an entropy greater than 7 and very close to 8.

Final Considerations

It’s important to understand that rules like this are not substitutes for rules intrinsic to EDR/XDR that consider a series of factors, evidence and techniques and correlate them with each other .

Below are the techniques used for example by Elastic’s EDR to detect ransomware behavior.

https://github.com/elastic/protections-artifacts/tree/main/ransomware

But when an EDR system cannot be installed due either to commercial issues (cost) or architectural/technological issues, rules of this type can help in the detection of potential threats such as ransomware.

The challenge is to try to reduce the noise, isolate false positives as much as possible, and continuously improve the detection rules.

These Solutions are Engineered by Humans

Did you learn from this article? Perhaps you’re already familiar with some of the techniques above? If you find cybersecurity issues interesting, maybe you could start in a cybersecurity or similar position here at Würth Phoenix.