False positives in OT security happen because signature-based detections often lack industrial context—such as asset roles (PLC vs HMI vs engineering workstation), allowed communications by zone/conduit, change windows, and process criticality. In OT/ICS, the same “suspicious” pattern (like scanning, SMB traffic, or new connections) can be normal during maintenance, vendor support, or automation workflows. Context-driven detection reduces noise by combining protocol-aware events (e.g., controller writes, logic downloads) with where the activity occurs, who initiated it, and whether it’s expected, making alerts both safer and more actionable.

Why false positives are worse in OT than IT

False positives are annoying in any environment. In OT, they can become dangerous.

In IT, the cost of a false positive is usually time

A SOC analyst loses time triaging. Maybe an employee loses access briefly. The business impact is real—but generally recoverable.

In OT, false positives can create operational risk

If a “security response” is triggered by a false positive, you can end up with:

Unplanned downtime (disconnecting an HMI or historian mid-shift)
Safety risk (interrupting communications to safety-related systems)
Quality losses (scrap, batch failures, rework)
Loss of trust (operations stop taking security alerts seriously)

This is why OT organizations often treat alerting as a safety-adjacent discipline, not just a cyber one.

OT reality: you can’t “block first and ask questions later”

In enterprise IT, aggressive prevention can be acceptable. In OT, prevention must be carefully scoped because the priority is typically:

Safety
Availability (uptime)
Integrity (correct control)
Confidentiality

False positives collide with this priority order—and that’s why context becomes the difference between “monitoring” and “noise.”

What “false positive” means in an industrial environment

In OT security, “false positive” has multiple flavors. If you treat them all the same, you’ll tune out the wrong things.

Three kinds of “false positives” you must distinguish

1) Technically false

The detection is simply wrong. Example: an alert claims “PLC write” but the traffic was read-only polling.

Fix: improve parsing, protocol decoding, or detection logic.

2) Technically true but operationally expected

The activity happened, but it was legitimate. Example: engineering workstation downloads logic during a scheduled maintenance window.

Fix: add context (maintenance schedules, approved work orders, asset roles) and build “expected change” workflows.

3) Technically true but operationally acceptable risk

The behavior is not ideal, but the plant accepts it (for now). Example: legacy SMB traffic between two Level 3 systems that can’t be upgraded quickly.

Fix: document exceptions, add compensating controls, and track a remediation roadmap.

Why this matters

If you label all three as “false positives,” you’ll end up suppressing signals that should become:

a change-management improvement,
a segmentation project,
or a targeted hardening plan.

The goal isn’t “zero alerts.” The goal is high signal, correct routing, and safe response.

Why signatures struggle in OT/ICS

Signature-based detection (classic IDS/IPS thinking) can still be useful. But in OT, signatures alone fail for structural reasons.

1) OT networks are full of “weird but normal”

Industrial environments contain behaviors that look suspicious in IT:

Broadcast and discovery chatter (from industrial tooling)
Legacy protocols, plaintext auth, unusual ports
Shared accounts and vendor tools
One-to-many polling patterns that resemble scanning

Signatures will flag these repeatedly unless you add context.

2) High diversity of devices and long lifecycles

OT endpoints include devices designed decades apart. Many cannot be patched quickly or speak modern security telemetry. Signatures often assume “normal” OS behavior, but OT endpoints may behave differently:

Proprietary stacks
Limited TCP/IP implementations
Rare protocol features that trigger IDS heuristics

3) The same pattern has different meaning depending on asset role

A signature might flag “new connection to port 502 (Modbus).”
But context determines whether that’s serious:

If the destination is a PLC in Level 2, and the source is an unknown workstation, that can be a big deal.
If the destination is a protocol gateway and the source is a known HMI, it’s probably normal.

Without asset roles and zones, signatures can’t tell.

4) Maintenance windows and vendor support create “burst anomalies”

Plants change state. Vendors connect. Engineers run scans. Projects are deployed. Backups happen. During these windows, signatures light up—unless the detection system knows “this is expected, now.”

5) Advanced attacks don’t always match known signatures

Many high-impact OT incidents don’t look like classic malware at first. They look like:

legitimate remote access used unusually,
engineering tools used at odd times,
controller writes from the wrong host,
configuration drift across zones.

These are behavioral and contextual problems, not signature problems.

Context beats signatures: the OT context model

Context-driven detection answers a simple question signatures can’t:

Is this activity meaningful and risky in this plant, on this asset, in this zone, at this time, under these operational conditions?

The OT context model (five pillars)

Here are the five pillars that turn raw detections into actionable OT alerts:

Asset context
- Role: PLC, Safety PLC, HMI, historian, engineering workstation
- Vendor/model/firmware family
- Ownership: controls team, operations, vendor-managed
- Criticality: safety/production impact
Network context (zones and conduits)
- Site, line, cell/area
- Purdue-inspired levels (L3, L2, etc.)
- Allowed paths (conduit policy) vs observed paths (drift)
Protocol and operation context
- Not just “Modbus traffic,” but “Modbus function = write”
- Not just “CIP,” but “download/program mode” events
- Read vs write vs configuration vs firmware changes
Temporal context (time and change state)
- Maintenance windows
- Approved work orders or planned vendor sessions
- Shift patterns, batch runs, changeovers
Threat context (intent indicators)
- Known bad infrastructure (when available)
- Repeated failed auth attempts
- Pivot patterns: IT → DMZ → OT
- Combined telemetry: EDR + remote access + OT protocol operations

A simple risk scoring approach

You can formalize context into a score that helps routing and prioritization.

One practical model is:

OT_Alert_Priority=Consequence×Confidence×Exposure×Unexpectedness

Where:

Consequence: how bad it is if the target asset is affected (safety/production criticality)
Confidence: how sure you are the event represents what you think it does
Exposure: how reachable/connected the asset is (e.g., remote access pathways, cross-zone access)
Unexpectedness: whether this action violates baseline, policy, or the change window

This doesn’t need to be perfect math. It needs to be consistent and explainable.

The “context layers” that reduce OT alert noise fast

If your OT alerts are noisy today, you don’t need a full transformation to start improving. These context layers are the highest leverage.

Layer 1: Asset role classification (PLC vs HMI vs EWS)

If you do only one thing, do this.

Why it works:
Many alerts become instantly triageable when you know whether the target is a controller, an operator station, or a server.

Examples of how it changes severity:

“New SMB session to Safety PLC” → likely nonsense (PLC might not even run SMB) or misclassification—investigate
“New SMB session to historian server” → maybe normal (backup, patching)
“New write operation to PLC” → likely high consequence

Layer 2: Zone/cell mapping (where the event happened)

Route alerts by operational boundaries:

Which plant?
Which line/cell?
Which zone and conduit?

Why it works:
It prevents “central SOC blindness” and enables correct escalation to site owners.

Layer 3: Maintenance windows and approved changes

Tag events as:

planned,
unplanned,
unknown.

Why it works:
Most OT “noise spikes” are tied to legitimate work. Tagging doesn’t suppress everything—it relabels it so the SOC doesn’t panic and OT doesn’t ignore.

Layer 4: Protocol operation awareness (read vs write vs download)

Protocol-level “operation context” is the difference between:

“traffic exists”
and
“control behavior changed.”

Layer 5: Baselines and drift control (“new talker,” “new path,” “new function”)

OT environments are repetitive. Use that to your advantage:

new talker to controller,
new cross-zone path,
new function code used.

Baselining alone can create noise if you don’t pair it with Layers 1–4.

OT examples: alerts that look malicious but aren’t

These are common OT false-positive scenarios that signatures flag loudly.

Example 1: “Port scan detected” during commissioning

What signatures see:
A host connects to many IPs/ports → “scan.”

What context reveals:
A vendor laptop is commissioning a skid, enumerating PLCs/HMIs as part of startup.

How to fix without hiding real scans:

Tag the vendor laptop as “commissioning host”
Limit the scope by zone/cell and time
Create a rule: scanning is medium if inside commissioning window; high if outside and targeting controllers

Example 2: “New device discovered” every week (it’s not new)

What signatures see:
A device appears with a new IP/MAC → “new asset.”

What context reveals:
It’s the same mobile maintenance workstation with DHCP changes, or NAT hides identity.

Fix:

Improve asset identity with stable attributes (certs where possible, switchport, device fingerprinting, protocol identity)
Track “asset identity confidence” separately from “asset exists”

Example 3: “Unauthorized RDP” that’s actually a jump host workflow

What signatures see:
RDP session to an OT server.

What context reveals:
This is the approved remote access path via the OT DMZ jump host.

Fix:

Model remote access pathways explicitly (approved jump hosts, approved accounts, MFA)
Alert on deviations: direct RDP bypassing jump hosts, or new sources reaching Level 3/2 directly

Example 4: “SMB lateral movement” that’s a patch or backup job

What signatures see:
SMB file transfers between servers.

What context reveals:
Scheduled backup, patch distribution, or historian data movement.

Fix:

Tag scheduled jobs and maintenance accounts
Alert only when SMB appears in prohibited zones (e.g., Level 2 cell network) or from unusual sources

Example 5: “Industrial protocol anomaly” caused by flaky links

What signatures see:
Retries, malformed frames, weird timing → “anomaly.”

What context reveals:
Poor cabling, overloaded switch, or serial-to-IP gateway issues.

Fix:

Add an “OT reliability lens”: correlate with network health metrics
Route to operations as “network reliability incident” rather than “cyber incident”

OT examples: real attacks that signatures often miss

Context doesn’t only reduce false positives—it increases true positives where signatures are blind.

Example 1: Legit credentials used at the wrong time

Attack pattern:
An attacker uses valid remote access credentials to reach a jump host.

Why signatures miss it:
Nothing matches known malware. Login looks “successful.”

Context catches it:

user logs in outside normal shift pattern
session originates from unusual geography or device
followed by new talker to controller or new cross-zone path

Example 2: “Living off the land” in OT

Attack pattern:
Use built-in tools, scripts, or engineering software to interact with PLCs.

Why signatures miss it:
No known malicious payload; it’s “normal tooling.”

Context catches it:

engineering tool used on unusual controller group
logic download outside change window
write operations from non-engineering host

Example 3: Slow segmentation drift creating an open pathway

Attack pattern:
Over months, exceptions accumulate. A path from IT to Level 2 becomes possible.

Why signatures miss it:
No single event screams “attack.”

Context catches it:

policy drift: new cross-zone communications appear
exposure scoring: new remote access route to lower levels
“should vs is” analysis flags conduit violations

Example 4: Targeting the process, not the OS

Attack pattern:
Manipulating setpoints, mode changes, or controller logic.

Why signatures miss it:
It’s not “malware.” It’s misuse of control operations.

Context catches it:
Protocol operation context + asset criticality + baseline drift.

A practical framework: detection engineering for OT

“Detection engineering” sounds like a big-company discipline. In OT, it can be lightweight—but it must be explicit.

Step 1: Define the few events that should always be high priority

Your “always-care” list varies by plant, but typically includes:

controller write operations from unapproved sources
logic downloads / program mode changes
new remote access path into Level 2
new talker to safety systems
cross-zone communications that bypass the DMZ

Step 2: Create an OT detection catalog (vendor-neutral)

Build a simple catalog that defines:

detection name (stable identifier)
description (in OT language)
required fields (asset role, zone, protocol operation)
severity logic
owner and escalation path
OT-safe containment options
tuning notes and known false-positive scenarios

This becomes the shared contract between SOC and OT.

Step 3: Use “context gates” before raising severity

Instead of firing high severity immediately, use gating conditions:

Is the target asset a PLC or safety controller?
Is the source an engineering workstation or an unknown host?
Is it inside a maintenance window?
Is the path cross-zone or within the same cell?

Step 4: Track exceptions as managed risk, not “silenced alerts”

Exceptions should have:

an owner,
an expiration date,
a compensating control,
and a remediation plan.

Otherwise, “tuning” becomes “forgetting.”

Tuning playbook: how to cut OT false positives without going blind

This section is designed as an operational playbook your team can apply.

Phase 1 (Week 1–2): Stop the bleeding (without suppressing signal)

Goal: Reduce noise by routing and context, not by muting.

Do:

Turn on asset role classification and site/zone tags
Add maintenance window tagging (even manual at first)
Implement deduplication and aggregation in your SIEM/SOAR
Separate “cyber” alerts from “reliability” anomalies

Don’t:

Disable whole categories like “scan detected”
Blanket-suppress industrial protocol alerts
“Tune” by excluding entire subnets without understanding coverage loss

Phase 2 (Week 3–6): Build allowlists where they actually work

OT allowlisting works best on conduits, not flat networks.

Do:

Create “allowed communications” models by zone/cell
Start at key conduits: IT/OT boundary, OT DMZ to Level 3, Level 3 to Level 2
Flag drift as medium severity; escalate only if it touches controllers/safety or occurs outside planned work

Don’t:

Attempt perfect allowlists for every endpoint on day one
Assume the baseline is “policy”; baseline is only an observation

Phase 3 (Week 6–12): Convert recurring false positives into structured context

Take the top 20 repeating alerts and classify them:

technically false → fix parsing/data quality
expected → add maintenance/work-order integration
acceptable risk → document exception + mitigation plan

This is how you turn alert noise into program maturity.

Metrics that prove improvement (without gaming the numbers)

If you only track “alert count,” you’ll optimize for silence. Track metrics that reflect operational value.

Recommended OT alert quality metrics

1) Actionability rate

What percentage of alerts lead to a meaningful action (ticket, validated change, containment decision)?

Define “meaningful” upfront.
Track by alert type.

2) Time to validate operational context

How long does it take to determine “expected vs unexpected”?

This is a core bottleneck in OT.

3) True positive yield for top detections

For the top 10 detection types, track:

% true security issue
% expected change
% data quality issue

4) Drift reduction

Track cross-zone communication drift over time:

number of new cross-zone paths per week
number of expired exceptions removed
number of conduits moved to allowlist enforcement

5) Coverage and visibility confidence

Track what portion of critical conduits/cells are monitored and with what fidelity:

SPAN vs TAP
packet loss estimates
sensor uptime

A simple “alert health” scorecard

You can build a weekly scorecard like:

Total OT alerts ingested
Alerts by severity (Critical/High/Medium/Low)
% tagged with site/zone/asset role
% tagged with maintenance window status
Top 5 noisy rules and tuning actions taken
Incidents escalated to OT and their outcomes

The goal is to make “tuning” measurable and continuous.

SOC + OT workflow: making triage operationally safe

False positives aren’t only a tooling problem. They’re a workflow problem.

Define what the SOC is allowed to do

In many OT programs, the SOC can:

triage and enrich
open cases and notify OT
recommend containment actions

But the SOC should not:

isolate OT endpoints without OT approval
push firewall blocks that could interrupt process traffic
disable vendor accounts mid-support without a safety/operations check

Use an OT escalation template (copy/paste ready)

Subject: OT Alert – [Severity] – [Site/Zone] – [Detection Name]
What happened: [protocol operation / behavior]
Where: [plant, zone/cell, asset role]
Source: [host/IP/user/session]
Target: [asset ID, role, criticality]
Why it matters: [consequence in OT terms]
Expected? [maintenance window/work order status]
Recommended next steps: [verify with controls engineer; review jump host session; preserve evidence]
Safe containment options: [ranked list: restrict remote session, block at boundary, isolate workstation only if safe]

This reduces back-and-forth and speeds validation.

Create joint “OT-safe response” runbooks

For top scenarios (controller write, logic download, new remote access path), define:

verification steps
who must approve containment
what “containment” means safely
what evidence to capture
recovery and lessons learned steps

Context reduces false positives; runbooks reduce the risk of mishandling true positives.

Reference patterns: allowlists, baselines, and maintenance windows

Pattern 1: Baseline → allowlist (but only on conduits)

Use baselines to learn normal traffic, then convert stable paths into allowlists at choke points.

Why it works:
It shrinks the attack surface while keeping operations stable.

Pattern 2: Maintenance-aware alerting

Tag planned work using:

calendar windows (simple start)
work orders (better)
remote access session approvals (best)

Then apply logic:

Planned + known engineering workstation + expected PLC operations → informational or medium
Unplanned + unknown host + controller write → high or critical

Pattern 3: “Two-person rule” for high-consequence actions

If an alert would trigger a potentially disruptive response:

require approval from OT operations or controls lead
log the approval for audit and learning

This preserves safety and trust.

Vendor-neutral checklist: what to demand from OT security tools

Whether you use industrial IDS, OT NDR, or SIEM correlation, demand these capabilities to reduce false positives.

Must-have capabilities for low-noise OT detection

Asset role identification (PLC/HMI/EWS/historian, not just IP/MAC)
Zone and conduit mapping with exportable tags
Protocol decoding that distinguishes reads vs writes vs downloads
Baselining with drift detection and change approval workflows
Maintenance window tagging and suppression controls
Evidence links (clear “why” behind alerts)
Integrations (SIEM/ticketing) that preserve context, not just message text
Deduplication/aggregation controls to avoid alert storms

Red flags during demos

“We detect scans” but cannot show which assets are controllers vs workstations
“We support Modbus” but cannot show operation-level context (read vs write)
“We integrate with SIEM” but only via unstructured message strings
“Just tune it out” advice without exception tracking and expiry dates

FAQ

Why are OT security false positives so common?

Because many detections rely on signatures or generic IT heuristics that don’t understand OT context—asset roles, industrial protocol operations, zone boundaries, and maintenance windows. OT environments also contain legacy and specialized behaviors that look suspicious in IT but are normal in plants.

Are signatures useless in OT?

No. Signatures can still catch known malware, exploit patterns, and suspicious scanning. But in OT they should be supplemented by context so alerts reflect operational reality and consequence.

What context reduces OT alert fatigue the most?

The highest-impact context layers are: asset role classification, site/zone/cell mapping, maintenance windows and approved changes, and protocol operation awareness (read vs write vs logic download).

How do we tune OT alerts without missing real incidents?

Avoid blanket suppression. Use context gating (asset criticality, zones, change windows), deduplicate repeated alerts, document exceptions with expiry dates, and focus on a small set of high-consequence detections first.

Who should own OT alert triage: SOC or OT?

It should be a shared model. The SOC can triage and correlate across IT/OT, but OT teams must validate operational context and approve disruptive containment actions. Clear runbooks and escalation templates are essential.

False Positives in OT Security: Why Context Beats Signatures (and How to Fix Alert Fatigue)

Why false positives are worse in OT than IT

In IT, the cost of a false positive is usually time

In OT, false positives can create operational risk

OT reality: you can’t “block first and ask questions later”

What “false positive” means in an industrial environment

Three kinds of “false positives” you must distinguish

1) Technically false

2) Technically true but operationally expected

3) Technically true but operationally acceptable risk

Why this matters

Why signatures struggle in OT/ICS

1) OT networks are full of “weird but normal”

2) High diversity of devices and long lifecycles

3) The same pattern has different meaning depending on asset role

4) Maintenance windows and vendor support create “burst anomalies”

5) Advanced attacks don’t always match known signatures

Context beats signatures: the OT context model

The OT context model (five pillars)

A simple risk scoring approach

The “context layers” that reduce OT alert noise fast

Layer 1: Asset role classification (PLC vs HMI vs EWS)

Layer 2: Zone/cell mapping (where the event happened)

Layer 3: Maintenance windows and approved changes

Layer 4: Protocol operation awareness (read vs write vs download)

Layer 5: Baselines and drift control (“new talker,” “new path,” “new function”)

OT examples: alerts that look malicious but aren’t

Example 1: “Port scan detected” during commissioning

Example 2: “New device discovered” every week (it’s not new)

Example 3: “Unauthorized RDP” that’s actually a jump host workflow

Example 4: “SMB lateral movement” that’s a patch or backup job

Example 5: “Industrial protocol anomaly” caused by flaky links

OT examples: real attacks that signatures often miss

Example 1: Legit credentials used at the wrong time

Example 2: “Living off the land” in OT

Example 3: Slow segmentation drift creating an open pathway

Example 4: Targeting the process, not the OS

A practical framework: detection engineering for OT

Step 1: Define the few events that should always be high priority

Step 2: Create an OT detection catalog (vendor-neutral)

Step 3: Use “context gates” before raising severity

Step 4: Track exceptions as managed risk, not “silenced alerts”

Tuning playbook: how to cut OT false positives without going blind

Phase 1 (Week 1–2): Stop the bleeding (without suppressing signal)

Phase 2 (Week 3–6): Build allowlists where they actually work

Phase 3 (Week 6–12): Convert recurring false positives into structured context

Metrics that prove improvement (without gaming the numbers)

Recommended OT alert quality metrics

1) Actionability rate

2) Time to validate operational context

3) True positive yield for top detections

4) Drift reduction

5) Coverage and visibility confidence

A simple “alert health” scorecard

SOC + OT workflow: making triage operationally safe

Define what the SOC is allowed to do

Use an OT escalation template (copy/paste ready)

Create joint “OT-safe response” runbooks

Reference patterns: allowlists, baselines, and maintenance windows

Pattern 1: Baseline → allowlist (but only on conduits)

Pattern 2: Maintenance-aware alerting

Pattern 3: “Two-person rule” for high-consequence actions

Vendor-neutral checklist: what to demand from OT security tools

Must-have capabilities for low-noise OT detection

Red flags during demos

FAQ

Why are OT security false positives so common?

Are signatures useless in OT?

What context reduces OT alert fatigue the most?

How do we tune OT alerts without missing real incidents?

Who should own OT alert triage: SOC or OT?

Federico Pacifici

You may also like

IEC 62443 Zones and Conduits – Explained with...

Types of Firewalls – The Foundation of Network...

Advanced SOC Architecture: Building a Modern Security Operations...

SOC plus SIEM plus SOAR: The Architecture of...

Threat Hunting Methodologies: Unearthing the Unseen in Modern...

Edge vs Cloud in AIoT: Embracing Hybrid Intelligence...