Skip to content

Detecting Threats with Graylog Pipelines - Part 3

Now that we've normalized and enriched our events, let's get into the actual threat detection logic that brings SIEM-like features to open source Graylog.

Graylog-Pipeline-Methodology-2

As you can see in the diagram above, we perform normalization, enrichment, and threat detection in standalone pipelines. This allows far more granularity and control over how the messages flow through many stages of rules.

For instance, it would not make sense to send Linux syslog events through a pipeline containing Windows threat signatures.

However, since Linux syslog may also contain network information (UFW, iptables, etc), we can choose to send those messages through more than one applicable pipeline.

We handle this routing logic in Stage 20 of our hypothetical example.


DETECTING THREATS IN WINDOWS PROCESS CREATION EVENTS

Let's say we wanted to develop a pipeline for analyzing Windows Process Creation events for threats or anomalies.

The first stage of this pipeline should come after the normalization and enrichment stages--let's call it stage 20. Stage 20 would contain only a single rule which serves to pass only relevant events to the rest of the stages of this pipeline. Here's an example of such a rule.

rule "pass_win_process_creation_only"
// to ensure that all subsequent stages only process relevant events
when
	( // native event ID 4688
		has_field("event_type") AND 
		to_string($message.event_type) == "wineventlog" AND
		has_field("event_id") AND
		to_string($message.event_id) == "4688"
	) 
	OR
	( // sysmon event ID 1 
		has_field("event_type") AND 
		to_string($message.event_type) == "sysmon" AND
		has_field("event_id") AND
		to_string($message.event_id) == "1"
	)
then
	// pass this message to subsequent stages
end

Notice this rule actually does not take any actions inside of the then block. This is our workaround for using a Pipeline rule that simply exists as a flow controller for stages that come after it. Any messages not matching this rule will not be passed to subsequent stages.

Now that we know that stages 21 and beyond will only ever see Windows Process Creation events, we can begin crafting threat signatures that are applicable to these events without worrying about wasting CPU cycles.

Great, so where can I find some solid inspiration for threat rules to build into my pipeline?

I'm so glad you asked! You can find nearly everything you need in the Sigma project created by Florian Roth. Even more specific to this example, look at the many existing rules for Windows Process Creation.

Let's use this rule as our example, win_susp_schtask_creation, which is looking for creation of scheduled tasks that do not match known common tasks. Now to be clear, this would be considered a very low risk, high false-positive signature, but we'll address that later in this post.

We must examine the logic of this Sigma rule and convert it into a Graylog pipeline like this (I added some additional exclusions as an example):

rule "persistence_susp_schtask_creation"
// https://github.com/Neo23x0/sigma/blob/master/rules/windows/process_creation/win_susp_schtask_creation.yml
// https://attack.mitre.org/techniques/T1053/
when
	(
		has_field("process_name") AND
		has_field("command_line") AND
		ends_with(to_string($message.process_name), "schtasks.exe", true) AND
		contains(to_string($message.command_line), "/create", true)
	)
	AND NOT
	// Events to filter out
	(
		( // ignore SYSTEM created tasks
			has_field("domain_username")
			AND
			to_string($message.domain_username) == "NT AUTHORITY\\SYSTEM"
		)
		OR
		( // Dell Command Update
			has_field("process_parent") 
			AND
			ends_with(to_string($message.process_parent), "DellCommandUpdate.exe", true)
		)
		OR
		( // Microsoft Office ClickToRun
			has_field("process_parent") 
			AND
			ends_with(to_string($message.process_parent), "\\Program Files\\Common Files\\microsoft shared\\ClickToRun\\OfficeClickToRun.exe", true)
		)
        // etc, etc
	)
then
	set_field("threat_detected", true);
	set_field("threat_name", "persistence_susp_schtask_creation");
	set_field("threat_desc", "Detects the creation of scheduled tasks in user session");
	set_field("threat_tactic", "persistence");
	set_field("threat_technique", "scheduled_task");
	set_field("threat_score", 2);
	set_field("threat_id", "T1053");
	set_field("threat_count", to_long($message.threat_count) + 1);
end

Notice all of the fields we are setting if the conditions are true:

  • threat_detected:true allows analysts to easily query for all threats over a period of time and allows for subsequent pipeline rules to treat this message differently because of a potential threat being present (ie. expensive enrichment)
  • threat_name tells the analyst the name of the threat rule that identified this event as a threat, allowing for quick rule analysis and tuning if needed.
  • threat_desc adds a friendly description of the threat to the event itself.
  • threat_tactic / threat_technique / threat_id all provide MITRE ATT&CK context to the signature.
  • threat_score is arguably most important here because this is how we later decide the alert action that should take place for this event.
  • threat_count provides a running count of how many threat signatures fired against this event. This is helpful because an event firing on a single low-risk threat is no big deal, an event that matches many low risk threats might be suddenly more important.

ASSIGNING AND ELEVATING THREAT SCORES

We adopted a threat scoring system that accounts for severity of a threat as well as prevalence of false positives.

For instance, a threat score 1 is assigned to a low threat that is known to have a high false positive rate. A threat score 9 is a high threat with virtually zero false positives.

Threat-Score-Matrix

Now let's discuss how we distribute our threat rules to guarantee that we prioritize high-risk threats over lower risk threats. The challenge here is that the threat_score field will be overwritten if another rule matches the event in a later stage, so we want to ensure that the higher threat_score is written later in the pipeline.

Our pipeline and stage layout for this would resemble the diagram below.

Windows_Threats

This means that even if a "low" threat is detected in an early stage, the final threat_score will be assigned by the highest threat which matched the event.

  • Stage 21 - threat_score:1 (low severity, high false-positive)
  • Stage 22 - threat_score:2 (low severity, medium false-positive)
  • and so on...

Now you'll understand why the example rule above had a threat_score:2, because scheduled tasks are created all the time and would generate a lot of alerts if not kept in check.


Let's say we had threat intel on APT group "Winnie the Pooh" which revealed a piece of malware that creates a Scheduled Task named tigger on the infected endpoint by shelling out to schtasks.exe. This would present a very high confidence threat signature as /TN "tigger" is not something we'd expect to see in a healthy environment. We'd put this higher threat rule in a later stage, such as stage 29.

So let's produce a quick Process Creation threat signature for this:

rule "persistence_APT_winnie_the_pooh"
// 
when
	(
		has_field("process_name") AND
		has_field("command_line") AND
		ends_with(to_string($message.process_name), "schtasks.exe", true) AND
		contains(to_string($message.command_line), "/create", true) AND
		contains(to_string($message.command_line), "tigger", true)
	)
then
	set_field("threat_detected", true);
	set_field("threat_name", "persistence_APT_winnie_the_pooh");
	set_field("threat_desc", "Scheduled Task Name associated with APT");
	set_field("threat_tactic", "persistence");
	set_field("threat_technique", "scheduled_task");
	set_field("threat_score", 9);
	set_field("threat_id", "T1053");
	set_field("threat_count", to_long($message.threat_count) + 1);
end

Even though this message would've first matched our earlier rule titled persistence_susp_schtask_creation and been given a threat_score:2, this later rule would overwrite all threat_* fields with this higher severity threat. However, we would still know that a previous rule matched due to threat_count being 2 in this instance. Most importantly, we now know we have a threat_score:9 on our hands which would trigger immediate alerts/SMS/phone calls/etc to our SOC crew.

As I am sure you can imagine, these rules can become very complex and numerous.


UPGRADING THREAT SCORES WITH CONTEXT

What if we wanted to elevate a threat score according to asset value or some other data point?

image-4-1

Let's say all Domain Controllers in our environment have predictable hostnames such as dc-1 and so on... This would allow us to more easily isolate these systems in a pipeline rule to elevate the threat score of an event that is impacting them. Scheduled tasks are a good example here because while they are prominent across most environments, they shouldn't change incredibly often on key terrain such as a Domain Controller.

We could simply add a rule in a later pipeline stage to upgrade the threat score on messages where hostname:dc*.

rule "upgrade_threats_on_domain_controllers"
when
	to_bool($message.threat_detected) AND
	has_field("hostname") AND
	begins_with(to_string($message.hostname), "dc", true)
then
	// raise the threat score by 3
	set_field("threat_score", to_long($message.threat_score) + 3);
end

DOWNGRADING THREAT SCORES WITH CONTEXT

What about the many situations where a threat exists, but can be downgraded based on additional context? A good example is when you detect suspicious web requests being made against a web server, but the server is responding with non-2xx responses. While it's still noteworthy that the server is being scanned, the threat is minimal until the requests start succeeding.

For context, we have a threat rule for detecting many different known malicious strings in user-agent, uri, etc. for detecting vulnerability scanners and offensive tools. This threat rule is in the 21-30 stages. In stages 31, we have various "threat handling" rules which account for known risk-reducing situations such as the following:

rule "http_non_200_response"
// this rule is meant to downgrade a threat if related to an unsuccessful HTTP request
// such as a request to a non-existent resource which could be a precursor to attack,
// but not a threat by itself.
when
	(
		has_field("threat_detected") 
		AND
		has_field("http_response") 
		AND
		to_long($message.http_response) > 299
	)
	AND NOT
	( // Do not downgrade nmap
		has_field("request") 
		AND
		contains(to_string($message.request), "nmap", true)
	)
then
	set_field("threat_score", 1);
	set_field("threat_downgraded", true);
	set_field("threat_handling_rule", "http_non_200_response");
	set_field("threat_handling_desc", "This HTTP request was likely not successful due to a non-200 response code. A threat may still be present.");
end

Notice how we've added a couple new fields to the event.

  • threat_downgraded to inform the analyst this threat was originally higher, but was intentionally reduced
  • threat_handling_rule to inform the analyst which rule performed the downgrade action
  • threat_handling_desc provides a brief description of why the event was downgraded

As you can imagine, there are many different ways you can upgrade or downgrade a threat based on additional information. This step becomes critical before you start generating alerts that will flood the SOC and cause late night PagerDuty calls.


Graylog-Pipeline-Methodology_r2
Full Res

WHAT NEXT?

Now that you've leveraged pipeline rules to look for known threats:


LOOKING FOR EXPERTISE?

The Recon team consists of passionate experts that eat, sleep and breathe defensive security operations. If you are looking for a partner, check out our services or contact us.