Home IndustryFalse Positives in OT Security: Why Context Beats Signatures (and How to Fix Alert Fatigue)

False Positives in OT Security: Why Context Beats Signatures (and How to Fix Alert Fatigue)

by

False positives in OT security happen because signature-based detections often lack industrial context—such as asset roles (PLC vs HMI vs engineering workstation), allowed communications by zone/conduit, change windows, and process criticality. In OT/ICS, the same “suspicious” pattern (like scanning, SMB traffic, or new connections) can be normal during maintenance, vendor support, or automation workflows. Context-driven detection reduces noise by combining protocol-aware events (e.g., controller writes, logic downloads) with where the activity occurs, who initiated it, and whether it’s expected, making alerts both safer and more actionable.

Why false positives are worse in OT than IT

False positives are annoying in any environment. In OT, they can become dangerous.

In IT, the cost of a false positive is usually time

A SOC analyst loses time triaging. Maybe an employee loses access briefly. The business impact is real—but generally recoverable.

In OT, false positives can create operational risk

If a “security response” is triggered by a false positive, you can end up with:

  • Unplanned downtime (disconnecting an HMI or historian mid-shift)
  • Safety risk (interrupting communications to safety-related systems)
  • Quality losses (scrap, batch failures, rework)
  • Loss of trust (operations stop taking security alerts seriously)

This is why OT organizations often treat alerting as a safety-adjacent discipline, not just a cyber one.

OT reality: you can’t “block first and ask questions later”

In enterprise IT, aggressive prevention can be acceptable. In OT, prevention must be carefully scoped because the priority is typically:

  1. Safety
  2. Availability (uptime)
  3. Integrity (correct control)
  4. Confidentiality

False positives collide with this priority order—and that’s why context becomes the difference between “monitoring” and “noise.”


What “false positive” means in an industrial environment

In OT security, “false positive” has multiple flavors. If you treat them all the same, you’ll tune out the wrong things.

Three kinds of “false positives” you must distinguish

1) Technically false

The detection is simply wrong. Example: an alert claims “PLC write” but the traffic was read-only polling.

Fix: improve parsing, protocol decoding, or detection logic.

2) Technically true but operationally expected

The activity happened, but it was legitimate. Example: engineering workstation downloads logic during a scheduled maintenance window.

Fix: add context (maintenance schedules, approved work orders, asset roles) and build “expected change” workflows.

3) Technically true but operationally acceptable risk

The behavior is not ideal, but the plant accepts it (for now). Example: legacy SMB traffic between two Level 3 systems that can’t be upgraded quickly.

Fix: document exceptions, add compensating controls, and track a remediation roadmap.

Why this matters

If you label all three as “false positives,” you’ll end up suppressing signals that should become:

  • a change-management improvement,
  • a segmentation project,
  • or a targeted hardening plan.

The goal isn’t “zero alerts.” The goal is high signal, correct routing, and safe response.


Why signatures struggle in OT/ICS

Signature-based detection (classic IDS/IPS thinking) can still be useful. But in OT, signatures alone fail for structural reasons.

1) OT networks are full of “weird but normal”

Industrial environments contain behaviors that look suspicious in IT:

  • Broadcast and discovery chatter (from industrial tooling)
  • Legacy protocols, plaintext auth, unusual ports
  • Shared accounts and vendor tools
  • One-to-many polling patterns that resemble scanning

Signatures will flag these repeatedly unless you add context.

2) High diversity of devices and long lifecycles

OT endpoints include devices designed decades apart. Many cannot be patched quickly or speak modern security telemetry. Signatures often assume “normal” OS behavior, but OT endpoints may behave differently:

  • Proprietary stacks
  • Limited TCP/IP implementations
  • Rare protocol features that trigger IDS heuristics

3) The same pattern has different meaning depending on asset role

A signature might flag “new connection to port 502 (Modbus).”
But context determines whether that’s serious:

  • If the destination is a PLC in Level 2, and the source is an unknown workstation, that can be a big deal.
  • If the destination is a protocol gateway and the source is a known HMI, it’s probably normal.

Without asset roles and zones, signatures can’t tell.

4) Maintenance windows and vendor support create “burst anomalies”

Plants change state. Vendors connect. Engineers run scans. Projects are deployed. Backups happen. During these windows, signatures light up—unless the detection system knows “this is expected, now.”

5) Advanced attacks don’t always match known signatures

Many high-impact OT incidents don’t look like classic malware at first. They look like:

  • legitimate remote access used unusually,
  • engineering tools used at odd times,
  • controller writes from the wrong host,
  • configuration drift across zones.

These are behavioral and contextual problems, not signature problems.


Context beats signatures: the OT context model

Context-driven detection answers a simple question signatures can’t:

Is this activity meaningful and risky in this plant, on this asset, in this zone, at this time, under these operational conditions?

The OT context model (five pillars)

Here are the five pillars that turn raw detections into actionable OT alerts:

  1. Asset context
    • Role: PLC, Safety PLC, HMI, historian, engineering workstation
    • Vendor/model/firmware family
    • Ownership: controls team, operations, vendor-managed
    • Criticality: safety/production impact
  2. Network context (zones and conduits)
    • Site, line, cell/area
    • Purdue-inspired levels (L3, L2, etc.)
    • Allowed paths (conduit policy) vs observed paths (drift)
  3. Protocol and operation context
    • Not just “Modbus traffic,” but “Modbus function = write”
    • Not just “CIP,” but “download/program mode” events
    • Read vs write vs configuration vs firmware changes
  4. Temporal context (time and change state)
    • Maintenance windows
    • Approved work orders or planned vendor sessions
    • Shift patterns, batch runs, changeovers
  5. Threat context (intent indicators)
    • Known bad infrastructure (when available)
    • Repeated failed auth attempts
    • Pivot patterns: IT → DMZ → OT
    • Combined telemetry: EDR + remote access + OT protocol operations

A simple risk scoring approach

You can formalize context into a score that helps routing and prioritization.

One practical model is:

OT_Alert_Priority=Consequence×Confidence×Exposure×Unexpectedness

Where:

  • Consequence: how bad it is if the target asset is affected (safety/production criticality)
  • Confidence: how sure you are the event represents what you think it does
  • Exposure: how reachable/connected the asset is (e.g., remote access pathways, cross-zone access)
  • Unexpectedness: whether this action violates baseline, policy, or the change window

This doesn’t need to be perfect math. It needs to be consistent and explainable.


The “context layers” that reduce OT alert noise fast

If your OT alerts are noisy today, you don’t need a full transformation to start improving. These context layers are the highest leverage.

Layer 1: Asset role classification (PLC vs HMI vs EWS)

If you do only one thing, do this.

Why it works:
Many alerts become instantly triageable when you know whether the target is a controller, an operator station, or a server.

Examples of how it changes severity:

  • “New SMB session to Safety PLC” → likely nonsense (PLC might not even run SMB) or misclassification—investigate
  • “New SMB session to historian server” → maybe normal (backup, patching)
  • “New write operation to PLC” → likely high consequence

Layer 2: Zone/cell mapping (where the event happened)

Route alerts by operational boundaries:

  • Which plant?
  • Which line/cell?
  • Which zone and conduit?

Why it works:
It prevents “central SOC blindness” and enables correct escalation to site owners.

Layer 3: Maintenance windows and approved changes

Tag events as:

  • planned,
  • unplanned,
  • unknown.

Why it works:
Most OT “noise spikes” are tied to legitimate work. Tagging doesn’t suppress everything—it relabels it so the SOC doesn’t panic and OT doesn’t ignore.

Layer 4: Protocol operation awareness (read vs write vs download)

Protocol-level “operation context” is the difference between:

  • “traffic exists”
    and
  • “control behavior changed.”

Layer 5: Baselines and drift control (“new talker,” “new path,” “new function”)

OT environments are repetitive. Use that to your advantage:

  • new talker to controller,
  • new cross-zone path,
  • new function code used.

Baselining alone can create noise if you don’t pair it with Layers 1–4.


OT examples: alerts that look malicious but aren’t

These are common OT false-positive scenarios that signatures flag loudly.

Example 1: “Port scan detected” during commissioning

What signatures see:
A host connects to many IPs/ports → “scan.”

What context reveals:
A vendor laptop is commissioning a skid, enumerating PLCs/HMIs as part of startup.

How to fix without hiding real scans:

  • Tag the vendor laptop as “commissioning host”
  • Limit the scope by zone/cell and time
  • Create a rule: scanning is medium if inside commissioning window; high if outside and targeting controllers

Example 2: “New device discovered” every week (it’s not new)

What signatures see:
A device appears with a new IP/MAC → “new asset.”

What context reveals:
It’s the same mobile maintenance workstation with DHCP changes, or NAT hides identity.

Fix:

  • Improve asset identity with stable attributes (certs where possible, switchport, device fingerprinting, protocol identity)
  • Track “asset identity confidence” separately from “asset exists”

Example 3: “Unauthorized RDP” that’s actually a jump host workflow

What signatures see:
RDP session to an OT server.

What context reveals:
This is the approved remote access path via the OT DMZ jump host.

Fix:

  • Model remote access pathways explicitly (approved jump hosts, approved accounts, MFA)
  • Alert on deviations: direct RDP bypassing jump hosts, or new sources reaching Level 3/2 directly

Example 4: “SMB lateral movement” that’s a patch or backup job

What signatures see:
SMB file transfers between servers.

What context reveals:
Scheduled backup, patch distribution, or historian data movement.

Fix:

  • Tag scheduled jobs and maintenance accounts
  • Alert only when SMB appears in prohibited zones (e.g., Level 2 cell network) or from unusual sources

Example 5: “Industrial protocol anomaly” caused by flaky links

What signatures see:
Retries, malformed frames, weird timing → “anomaly.”

What context reveals:
Poor cabling, overloaded switch, or serial-to-IP gateway issues.

Fix:

  • Add an “OT reliability lens”: correlate with network health metrics
  • Route to operations as “network reliability incident” rather than “cyber incident”

OT examples: real attacks that signatures often miss

Context doesn’t only reduce false positives—it increases true positives where signatures are blind.

Example 1: Legit credentials used at the wrong time

Attack pattern:
An attacker uses valid remote access credentials to reach a jump host.

Why signatures miss it:
Nothing matches known malware. Login looks “successful.”

Context catches it:

  • user logs in outside normal shift pattern
  • session originates from unusual geography or device
  • followed by new talker to controller or new cross-zone path

Example 2: “Living off the land” in OT

Attack pattern:
Use built-in tools, scripts, or engineering software to interact with PLCs.

Why signatures miss it:
No known malicious payload; it’s “normal tooling.”

Context catches it:

  • engineering tool used on unusual controller group
  • logic download outside change window
  • write operations from non-engineering host

Example 3: Slow segmentation drift creating an open pathway

Attack pattern:
Over months, exceptions accumulate. A path from IT to Level 2 becomes possible.

Why signatures miss it:
No single event screams “attack.”

Context catches it:

  • policy drift: new cross-zone communications appear
  • exposure scoring: new remote access route to lower levels
  • “should vs is” analysis flags conduit violations

Example 4: Targeting the process, not the OS

Attack pattern:
Manipulating setpoints, mode changes, or controller logic.

Why signatures miss it:
It’s not “malware.” It’s misuse of control operations.

Context catches it:
Protocol operation context + asset criticality + baseline drift.


A practical framework: detection engineering for OT

“Detection engineering” sounds like a big-company discipline. In OT, it can be lightweight—but it must be explicit.

Step 1: Define the few events that should always be high priority

Your “always-care” list varies by plant, but typically includes:

  • controller write operations from unapproved sources
  • logic downloads / program mode changes
  • new remote access path into Level 2
  • new talker to safety systems
  • cross-zone communications that bypass the DMZ

Step 2: Create an OT detection catalog (vendor-neutral)

Build a simple catalog that defines:

  • detection name (stable identifier)
  • description (in OT language)
  • required fields (asset role, zone, protocol operation)
  • severity logic
  • owner and escalation path
  • OT-safe containment options
  • tuning notes and known false-positive scenarios

This becomes the shared contract between SOC and OT.

Step 3: Use “context gates” before raising severity

Instead of firing high severity immediately, use gating conditions:

  • Is the target asset a PLC or safety controller?
  • Is the source an engineering workstation or an unknown host?
  • Is it inside a maintenance window?
  • Is the path cross-zone or within the same cell?

Step 4: Track exceptions as managed risk, not “silenced alerts”

Exceptions should have:

  • an owner,
  • an expiration date,
  • a compensating control,
  • and a remediation plan.

Otherwise, “tuning” becomes “forgetting.”


Tuning playbook: how to cut OT false positives without going blind

This section is designed as an operational playbook your team can apply.

Phase 1 (Week 1–2): Stop the bleeding (without suppressing signal)

Goal: Reduce noise by routing and context, not by muting.

Do:

  • Turn on asset role classification and site/zone tags
  • Add maintenance window tagging (even manual at first)
  • Implement deduplication and aggregation in your SIEM/SOAR
  • Separate “cyber” alerts from “reliability” anomalies

Don’t:

  • Disable whole categories like “scan detected”
  • Blanket-suppress industrial protocol alerts
  • “Tune” by excluding entire subnets without understanding coverage loss

Phase 2 (Week 3–6): Build allowlists where they actually work

OT allowlisting works best on conduits, not flat networks.

Do:

  • Create “allowed communications” models by zone/cell
  • Start at key conduits: IT/OT boundary, OT DMZ to Level 3, Level 3 to Level 2
  • Flag drift as medium severity; escalate only if it touches controllers/safety or occurs outside planned work

Don’t:

  • Attempt perfect allowlists for every endpoint on day one
  • Assume the baseline is “policy”; baseline is only an observation

Phase 3 (Week 6–12): Convert recurring false positives into structured context

Take the top 20 repeating alerts and classify them:

  • technically false → fix parsing/data quality
  • expected → add maintenance/work-order integration
  • acceptable risk → document exception + mitigation plan

This is how you turn alert noise into program maturity.


Metrics that prove improvement (without gaming the numbers)

If you only track “alert count,” you’ll optimize for silence. Track metrics that reflect operational value.

Recommended OT alert quality metrics

1) Actionability rate

What percentage of alerts lead to a meaningful action (ticket, validated change, containment decision)?

  • Define “meaningful” upfront.
  • Track by alert type.

2) Time to validate operational context

How long does it take to determine “expected vs unexpected”?

This is a core bottleneck in OT.

3) True positive yield for top detections

For the top 10 detection types, track:

  • % true security issue
  • % expected change
  • % data quality issue

4) Drift reduction

Track cross-zone communication drift over time:

  • number of new cross-zone paths per week
  • number of expired exceptions removed
  • number of conduits moved to allowlist enforcement

5) Coverage and visibility confidence

Track what portion of critical conduits/cells are monitored and with what fidelity:

  • SPAN vs TAP
  • packet loss estimates
  • sensor uptime

A simple “alert health” scorecard

You can build a weekly scorecard like:

  • Total OT alerts ingested
  • Alerts by severity (Critical/High/Medium/Low)
  • % tagged with site/zone/asset role
  • % tagged with maintenance window status
  • Top 5 noisy rules and tuning actions taken
  • Incidents escalated to OT and their outcomes

The goal is to make “tuning” measurable and continuous.


SOC + OT workflow: making triage operationally safe

False positives aren’t only a tooling problem. They’re a workflow problem.

Define what the SOC is allowed to do

In many OT programs, the SOC can:

  • triage and enrich
  • open cases and notify OT
  • recommend containment actions

But the SOC should not:

  • isolate OT endpoints without OT approval
  • push firewall blocks that could interrupt process traffic
  • disable vendor accounts mid-support without a safety/operations check

Use an OT escalation template (copy/paste ready)

Subject: OT Alert – [Severity] – [Site/Zone] – [Detection Name]
What happened: [protocol operation / behavior]
Where: [plant, zone/cell, asset role]
Source: [host/IP/user/session]
Target: [asset ID, role, criticality]
Why it matters: [consequence in OT terms]
Expected? [maintenance window/work order status]
Recommended next steps: [verify with controls engineer; review jump host session; preserve evidence]
Safe containment options: [ranked list: restrict remote session, block at boundary, isolate workstation only if safe]

This reduces back-and-forth and speeds validation.

Create joint “OT-safe response” runbooks

For top scenarios (controller write, logic download, new remote access path), define:

  • verification steps
  • who must approve containment
  • what “containment” means safely
  • what evidence to capture
  • recovery and lessons learned steps

Context reduces false positives; runbooks reduce the risk of mishandling true positives.


Reference patterns: allowlists, baselines, and maintenance windows

Pattern 1: Baseline → allowlist (but only on conduits)

Use baselines to learn normal traffic, then convert stable paths into allowlists at choke points.

Why it works:
It shrinks the attack surface while keeping operations stable.

Pattern 2: Maintenance-aware alerting

Tag planned work using:

  • calendar windows (simple start)
  • work orders (better)
  • remote access session approvals (best)

Then apply logic:

  • Planned + known engineering workstation + expected PLC operations → informational or medium
  • Unplanned + unknown host + controller write → high or critical

Pattern 3: “Two-person rule” for high-consequence actions

If an alert would trigger a potentially disruptive response:

  • require approval from OT operations or controls lead
  • log the approval for audit and learning

This preserves safety and trust.


Vendor-neutral checklist: what to demand from OT security tools

Whether you use industrial IDS, OT NDR, or SIEM correlation, demand these capabilities to reduce false positives.

Must-have capabilities for low-noise OT detection

  • Asset role identification (PLC/HMI/EWS/historian, not just IP/MAC)
  • Zone and conduit mapping with exportable tags
  • Protocol decoding that distinguishes reads vs writes vs downloads
  • Baselining with drift detection and change approval workflows
  • Maintenance window tagging and suppression controls
  • Evidence links (clear “why” behind alerts)
  • Integrations (SIEM/ticketing) that preserve context, not just message text
  • Deduplication/aggregation controls to avoid alert storms

Red flags during demos

  • “We detect scans” but cannot show which assets are controllers vs workstations
  • “We support Modbus” but cannot show operation-level context (read vs write)
  • “We integrate with SIEM” but only via unstructured message strings
  • “Just tune it out” advice without exception tracking and expiry dates

FAQ

Why are OT security false positives so common?

Because many detections rely on signatures or generic IT heuristics that don’t understand OT context—asset roles, industrial protocol operations, zone boundaries, and maintenance windows. OT environments also contain legacy and specialized behaviors that look suspicious in IT but are normal in plants.

Are signatures useless in OT?

No. Signatures can still catch known malware, exploit patterns, and suspicious scanning. But in OT they should be supplemented by context so alerts reflect operational reality and consequence.

What context reduces OT alert fatigue the most?

The highest-impact context layers are: asset role classification, site/zone/cell mapping, maintenance windows and approved changes, and protocol operation awareness (read vs write vs logic download).

How do we tune OT alerts without missing real incidents?

Avoid blanket suppression. Use context gating (asset criticality, zones, change windows), deduplicate repeated alerts, document exceptions with expiry dates, and focus on a small set of high-consequence detections first.

Who should own OT alert triage: SOC or OT?

It should be a shared model. The SOC can triage and correlate across IT/OT, but OT teams must validate operational context and approve disruptive containment actions. Clear runbooks and escalation templates are essential.

You may also like