HOME Resources Blog The Analyst Who Cried Malware: Rethinking False Positives and Alert Fatigue

|

The Analyst Who Cried Malware: Rethinking False Positives and Alert Fatigue

False positives aren’t just annoying. They’re corrosive.

Every unnecessary alert chips away at the analyst’s attention span. Every poorly designed rule teaches the SOC to distrust its own tools. Every noisy detection makes it harder to recognize when something actually matters.

In most environments, false positives aren’t the exception—they’re the rule. Some organizations see over 95% of their alerts turn out to be benign. That’s not just noise—that’s a failure in detection design.

We don’t talk about this enough. Because we’ve accepted alert fatigue as “just part of the job.” We act like it’s an analyst problem. Like people just need to “get better at triage.” But alert fatigue isn’t a Tier 1 issue—it’s a detection engineering issue.

This post is about fixing that.

We’ll break down:

  • What actually makes a rule noisy.
  • Why false positives happen even with “good logic.”
  • How to design rules that understand behavior, not just match strings.
  • And how to bring sanity back to the SOC by treating detection like a craft, not a checklist.

What Actually Is a False Positive?

At first glance, a false positive seems simple:

An alert fired, but nothing bad actually happened.

But in practice, it’s more complicated than that. False positives aren’t just the result of bad rules—they’re the result of detection logic that doesn’t understand its context.

Take this example:

CommandLine=*whoami*

That line might catch attackers performing basic recon. But it will also catch:

  • Sysadmins testing access.
  • Scripts checking permissions.
  • Debugging tools running in dev environments.

The logic isn’t wrong. The problem is: the rule assumes that *whoami* is always suspicious—everywhere, for everyone, at all times.

That’s the real root of a false positive:

A detection rule fired in the right way, at the wrong time, for the wrong reason.

A false positive isn’t just a flaw in detection—it’s a misalignment between what a rule sees and what the SOC actually cares about. And that misalignment is the first domino in the chain that leads to:

  • Alert fatigue
  • Missed incidents
  • Erosion of trust in the detection system

The rest of this post is about preventing that from happening in the first place.

Why Are So Many Rules Problematic?

Welcome to the part of the post where every detection engineer reading it silently mutters: “…yeah, I’ve written that rule.” 

Let’s be honest: most false positives don’t come from exotic edge cases. They come from rules that were never designed to understand context.

They match text. They match tools. They match syntax. But they don’t match intent. And when a detection rule can’t tell the difference between a red teamer, a sysadmin, and a confused intern… That’s not a “detection.” That’s a glorified grep.

Common Reasons Detection Rules Cause Noise

Overbroad Logic

If your rule treats any encoded PowerShell command as suspicious, you’re going to have a really bad time. PowerShell is used by attackers—but it’s also used by admins, backup scripts, config managers, and your own security tools. 

Lack of Contextual Anchoring

CommandLine=”*Invoke-WebRequest*”

That might catch malware downloading payloads. But if you don’t ask:

  • Who ran it?
  • From where?
  • With what parent process?
  • At what time?
  • On what kind of machine?

Then your rule has no anchor. It floats. It alerts whenever and wherever. And no one trusts it.

Indicator of Compromise (IOC) Addiction

Yes, known bad indicators are useful. But IOC-based rules don’t age well.

Attackers pivot domains, change hashes, and refresh infrastructure faster than you can push your content pack update. Good detection design should outlive the indicators it was built around.

Static Thresholds Without Behavior Profiles

Ten failed logins in a row might be normal for your help desk during a migration—or a brute-force attack. Without behavioral baselining, static thresholds are guesswork.

At best, they’re noisy. At worst, they’re misleading.

How to Design Detections to Avoid Alert Fatigue

Reducing alert fatigue doesn’t mean alerting less—it means alerting better.

You don’t need a giant wall of rules that fire constantly. You need a few solid ones that fire when behavior breaks its pattern—or when context says, “That doesn’t belong.”

Here are battle-tested tactics you can use in every detection rule to reduce noise while increasing confidence.

Add Context Anchors

CommandLine=”*net user*”

Add:

  • Is the user in IT?
  • Is this on a known admin box?
  • Is this during a change window?
  • What was the parent process?

Why: “net user” is harmless unless it’s run where it shouldn’t be. Anchor it to the context that matters.

Apply Time-based Logic

Don’t just match behavior—ask when it happened.

Examples:

  • PowerShell script executed outside of business hours
  • Scheduled task created after a user logs off
  • Registry key dropped, but payload doesn’t run until 5 days later

Why: Time-based anomalies surface intent.

Use Suppression, Without Ignoring

Instead of disabling noisy rules, suppress specific cases that are:

  • Always benign
  • Tied to known automation
  • Already covered by another control

Tag them, monitor them, and re-evaluate over time.

| where user != “backup” AND not src_ip IN [jump box range]Why: Suppression is smarter than silence. It lets you focus without being blind.

Enrich Before You Alert

Every alert should include:

  • Who ran it
  • From where
  • What parent/child processes were involved
  • Time of execution
  • Known behavior score (rarity, role, past context)

Why: Analysts shouldn’t have to investigate to decide if they should investigate. Give them the story with the signal.

Score Behavior, Don’t Just Match

Instead of:

Alert if whoami.exe is run

Try: 

+1 = whoami.exe

+2 = ran from unknown user

+3 = launched from signed Office macro

+2 = encoded PowerShell runs within 1 minute

IF behavior_score > 6 → alert.

Why: Binary logic is brittle. Scored logic gives you flexibility, tuning, and multi-factor detection without needing 14 separate rules.

Use Feedback Loops

Talk to your analysts. What rules do they:

  • Always ignore?
  • Mute in their queue?
  • Trust implicitly?

Ask “why?” If this rule fires, will the analyst care? That answer is worth more than another 50 lines of detection logic.

  • If the answer is “maybe,” rewrite it.
  • If the answer is “only if,” enrich it.
  • If the answer is “no,” delete it.

Why Most “Tuning” Efforts Fail

Ask almost any SOC: “That rule’s noisy—why is it still enabled?”

The answer is always the same: “We’re still tuning it.”

  • Translation:
  • We shipped it too early.
  • We didn’t think through the context.
  • We’re stuck reacting to its false positives instead of fixing the logic.

Typical tuning flow:

  1. Write over-broad rule
  2. Alert goes wild
  3. Add exceptions
  4. Add more exceptions
  5. Disable it “temporarily”
  6. Forget it exists

That’s not tuning. That’s defeat in slow motion.

Instead, better tuning would look something like:

  • Identify why an alert fired
  • Ask: was it logically correct, but contextually meaningless?
  • Check:
    • Who did it?
    • Where?
    • Is this common?
    • Is this already covered elsewhere?
  • Then decide:
    • Suppress it (safely)
    • Adjust it (smarter logic)
    • Kill it (bad rule)
    • Replace it (scored detection or correlation)

Tuning isn’t cleaning up alerts. It’s teaching your rules how to see. If a rule isn’t useful without tuning, it wasn’t ready to deploy.

What Detection Engineers Should Ask About Every Rule

Every detection rule you write is a promise: “If this fires, it’s worth the analyst’s time.”

That’s a heavy promise. So before you drop that rule into the wild, stop and ask a series of questions

“What Threat Does This Actually Detect?”

Not just what tool—but what tactic? Is it privilege escalation? Reconnaissance? C2? Persistence?

Bad sign: “It just seemed suspicious.”
Good sign: “This maps to a known technique, and it’s relevant to our threat model.”

“What Assumption Is This Rule Making?”

Every rule has hidden assumptions:

  • That certain users don’t run scripts…
  • That this tool is rare…
  • That this time of day is unusual…

If you don’t name those assumptions, your rule will fire constantly, and no one will know why.Ask: “In what situations would this fire correctly but unnecessarily?”

“Who Is Likely to Cause False Positives?”

If you already know the rule is likely to false-positive on:

  • Admins
  • Red teams
  • Service accounts
  • Monitoring tools

…then bake in suppression, context checks, or labels. Don’t leave it to Tier 1 to figure out what you already know.

“Will The Analyst Know What to Do with It?”

Every alert should answer these two questions:

  • Why did this fire?
  • What should I check next?

If your rule creates questions instead of answers, it’s not ready. Give context:

  • What behavior triggered this?
  • Is this rare?
  • What’s the next step in validation?

An alert without actionability is just noise with better branding. Every detection rule is like a new employee. If it can’t explain what it does, why it matters, or how it adds value—don’t hire it.

Rule Quality Over Quantity

Most detection teams are measured by how much they build:

  • How many rules they wrote
  • How many alerts they generated
  • How many IOCs they blocked

But here’s the uncomfortable truth:

  • More detections ≠ better detections.
  • More alerts ≠ more security.
  • More noise ≠ more coverage.

In fact, it’s often the opposite. The best detection engineers don’t write more rules. They write fewer rules with more intent. They treat each one like a product:

  • Designed for clarity
  • Tuned for precision
  • Measured for impact
  • Trusted by analysts

It’s not about catching everything. It’s about removing everything that doesn’t matter—so what’s left is an actual signal:

  • Fewer but more expressive rules
  • Smaller but better-scoped alert pipelines
  • Narrower scopes with broader behavioral coverage

A good detection doesn’t just say something happened. It says this happened, here’s why, and you should care. Our job isn’t just to catch badness. It’s to build a system that amplifies clarity, filters noise, and respects human attention.

Attention is our most valuable resource – and once the SOC stops trusting alerts, it stops responding to them. That’s when threats win.False positives aren’t just wasted time. They’re stories told badly—and every bad story trains the analyst to stop reading. The job isn’t just to detect attacks. It’s to make the truth so clear, so sharp, so undeniably real, that no one can miss it.

Detection Rule Review Checklist

Purpose & Threat Mapping

  • Does this rule map to a real threat or technique (e.g., MITRE TTP)?
  • Do we know which phase of the attack this detects (initial access, recon, etc.)?
  • Is this rule aligned with our threat model / environment?

Logic & Assumptions

  • What assumptions is this rule making (about users, tools, context)?
  • Is it overbroad or under-scoped?
  • Will this rule function the same in production as it did in test?

Context & Anchoring

  • Does the rule include any of the following filters?
    • User-based logic
    • Host role or tagging
    • Time of day / business hours
    • Parent-child process awareness
  • Does it avoid alerting on common admin or automation activity?

Alert Quality

  • Will an analyst understand why this alert fired?
  • Will they know what to do next?
  • Does the alert include enough enrichment (user, time, process, path, reputation)?
  • Is this alert actionable—or just informational?

Noise & Suppression

  • Have we tested this on live data?
  • Do we know the baseline false positive rate?
  • Are known noisy cases suppressed or whitelisted safely?
  • Do we know how to tune this without destroying its value?

Tuning Plan

  • Is there a defined feedback loop for improving this rule post-deployment?
  • Is this rule part of a correlation or scoring system (if low-confidence)?
  • Can it be retired or replaced by a better behavior-based rule in the future?

Maintenance & Ownership

  • Who owns this rule long-term?
  • How often is it reviewed?
  • What happens if it breaks?

Final Sanity Check

  • If this rule fires, will anyone care?
  • If this rule fires, will anyone know what to do next?

Concluding Pro Tip

For every new detection, write one sentence:
“This rule is valuable because it detects [behavior] in [context], and alerts when [anomaly] occurs.”
If you can’t do that, you don’t understand your rule well enough to deploy it.

About the Author

Daniel Koifman is a Security Researcher currently working at CardinalOps. Previously, he worked as a Detection Engineer for a Fortune 500 financial institution and a top Israeli MSSP. He is also an active contributor of various open-source repositories and loves to participate in various capture the flag (CTF) competitions.