When considering KPIs for your SOC, mean time to detect, contain, and remediate (MTTR, MTTC, and MTTR); incident and alert volumes; and false positive rates get most of the attention. Regularly monitoring these higher-level metrics is a best practice for good reason. They give security leadership and business stakeholders insight into the overall effectiveness of your SOC. For practitioners, keeping an eye on these KPIs ensures you don’t lose the forest for the trees.
But let’s flip that old adage on its head: the best park rangers spend plenty of time walking through the trees to make sure they really know the terrain. They get intimately familiar with weather patterns and how conditions change with the seasons. So as a security practitioner, digging into lower-level metrics can be incredibly valuable for deepening your understanding of your team’s overarching strategy and streamlining daily tasks and ad hoc tactics.
So which metrics should you focus on? This very much depends on your organization’s priorities and threat models. The possibilities are endless.
CardinalOps Security Researcher Daniel Koifman recently covered how to create two such lower-level metrics: one that pinpoints the day of peak network activity in a given month, and another that highlights initial login times for a given user. His post details how to create these metrics in Google Security Operations SIEM, or the Artist SIEM formerly known as Prince Chronicle, but they’re relevant to any SIEM.
Here we’ll pick up where Daniel left off and pivot to outlining sample use cases for these metrics. Let’s dive in!
Monitoring Peak Network Activity
The rule below calculates the maximum daily outbound bytes for an IP address over a 30-day window, then returns the maximum of the daily sums. While it’s in the YARA-L language, it can be translated to any SIEM’s detection logic.
rule metric_examples_network {
meta:
author = “Google Cloud Security”
events:
$net.metadata.event_type = “NETWORK_CONNECTION”
$net.network.sent_bytes > 0
$net.principal.ip= $ip
$net.principal.ip = “10.128.0.21”
match:
$ip over 1d
outcome:
$max_byte_count_window =
max(metrics.network_bytes_outbound(
period:1d,
window:30d,
metric:value_sum,
agg:max,
principal.asset.ip: $ip
))
condition:
$net and $max_byte_count_window > 0
}
This rule creates a metric that identifies which day saw peak network traffic within a given month. For an in-depth code-level analysis, check out Daniel’s full article. Great, but how would your team use this as part of its detection posture management workflows?
Here are a few sample scenarios:
Threat Hunting
Time is of the essence when threat hunting for incident response workflows. When you have little to go on, wildcard searches without any time windows are better than nothing but can take minutes (or even hours) to return results. And there’s nothing worse than a query stopping your investigation in its tracks.
For these scenarios, it’s helpful to have a specific time window to narrow down your search. Focusing initial queries on the day with peak network traffic improves search performance and surfaces insights that can serve as a great jumping off point. You can explore data points connected to the systems or applications that saw the peak and identify potentially correlated attacker activity.
Identifying Data Exfiltration
An unusually high volume of network traffic may indicate that data exfiltration occurred. And with the logic in the above rule focusing specifically on outbound traffic, it’s an even stronger indicator for this activity. This metric allows you to focus your investigation to events on the specific day.
Once you understand which user caused the traffic spike, you can pivot to other potentially correlated data: what other applications did the user access on or immediately before that day? Did they connect to any ports or external IPs that were out of the ordinary? Were there any anomalous protocols or encrypted channels used without business justification? Do you see any endpoint anomalies like first-time execution of FTP clients or cloud sync utilities?
Exploring these questions helps you identify the root cause and understand the attack path, but getting the answers starts with knowing when peak network traffic occurred.
Malware Detection
Malware typically operates by drawing upon system resources, especially with the use of command and control servers that create unexpected traffic patterns to fulfill whatever the malware’s purpose is. This means unusually high traffic volumes are a potential indication of the presence of malware in your environment.
Like the above examples, knowing which date that peak traffic occurred allows you to pinpoint investigations and identify which entities are connected to the malware attack, understand its blast radius, and scope how to respond.
Monitoring Initial User Logins
This second rule identifies when a specific user has logged into a given system or application for the first time.
rule metric_examples_success_authentication {
meta:
author = "Google Cloud Security"
events:
$login.metadata.event_type = "USER_LOGIN"
$login.security_result.action = "ALLOW"
$login.target.user.userid != /\$$/
strings.to_lower($login.target.user.userid) = $userid
match:
$userid over 1d
outcome:
$first_seen_login_window = max(metrics.auth_attempts_success(
period:1d,
window:30d,
metric:first_seen,
agg:min,
target.user.userid: $userid
))
$systems_accessed = array_distinct($login.principal.hostname)
condition:
$login and $first_seen_login_window = 0
}
Daniel’s post includes an in-depth explanation of the code. Again, it’s coded in YARA-L for Google SecOps SIEM but can be mapped to another language with relative ease.
Let’s cover a couple use cases where this metric can help your team’s daily tasks and ad hoc SOC tactics.
Insider Threat Monitoring
First seen logins can be standard activity, especially when onboarding new employees or rolling out new systems or applications. But it can be cause for concern when you see a new login to a system outside of the employee’s typical work patterns. Someone in HR or marketing suddenly logging into accounts related to cloud infrastructure or containing sensitive data should raise concerns. This kind of activity could be an insider threat where a disgruntled employee has gained unauthorized access to systems with the goal of disrupting business operations or stealing IP or customer data.
With this metric, you know when the employee first gained access to the systems or applications in question. From there you can dive into connected events to get the full picture of what the insider’s target is– and what you need to lock down to contain the threat.
Impossible Travel Detection
Knowing when a user first logged into a system or application provides a baseline for comparison with subsequent login attempts, a crucial data point for detecting impossible travel scenarios. Tweaking the rule logic to provide a specific time stamp for the initial login and incorporating geolocation data helps establish a starting point for tracking user movement. From there you can monitor subsequent logins and build follow-up detection rules that flag impossibly short travel times.
For example, if a user initially logs in from New York City, then that same user logs into applications or accesses resources from Europe just minutes later, it’s likely their accounts have been compromised. For these scenarios, the first seen login metrics helps detect potentially compromised accounts and streamline response workflows.
See The Forest AND The Trees Using Granular Metrics
These two metrics are just starting points to get your team thinking about how to incorporate more granular metrics into your threat monitoring workflows. While peak network traffic and initial logins are relevant metrics for many SOC teams, your organization might be better suited focusing on metrics related to events like authentication attempts, DNS queries, system configuration settings changes, or other telemetry. What’s most relevant will depend on your organization’s priority attack surfaces, threat models, and risk profile.
Back to our forest analogy: true mastery of the landscape requires a firm grasp of both the map’s general layout AND the specific features of the terrain in different areas. Similarly, monitoring lower-level metrics in tandem with higher-level metrics like MTTR helps level up your SOC programs and continuously strengthen your organization’s security posture.