Newsletter · Industrial Data · Machine Learning 10 May 2026

Why Contextualising Data Is the Foundation
for Effective Machine Learning on the Plant Floor

Introduction

Simply having millions of data points is not enough for machine learning to work well. This is the starting point most industrial ML projects get wrong — and it explains why so many fail silently after deployment.

The core problem: using uncontextualised data causes models to learn false patterns, leading to unreliable outputs and, ultimately, diminished trust in the technology. Once an operator learns to distrust the system, recovery is very difficult.

The Technical Concept

Contextualising industrial data means understanding where, when, how, and under what conditions measurements were taken. Metadata — such as timestamps, process location, sensor type, or whether data was manually collected — allows you to filter measurements and link numbers to real events.

Industrial data standards like OPC UA and i3X provide frameworks for unifying this contextual information across heterogeneous systems. These aren't just IT infrastructure choices — they directly determine whether your training data is trustworthy.

Lean Six Sigma practices reinforce the same principle from the operational side: measurements must be traceable and collected following defined procedures. An MSA (Measurement System Analysis) that detects 30% measurement variation is telling you that a third of your training data is noise.

"Data traceability and context matter as much as the algorithm itself. Without them, machine learning on the plant floor is a gamble."

The Real Problem

Feeding models unvalidated or context-free data is the single biggest mistake in industrial ML implementations. The model trains efficiently, metrics look acceptable in the lab, and then it fails in production — because it learned patterns tied to shift schedules, sensor drift, or manual data entry artefacts rather than actual process physics.

Practical Implications

A minimum viable data strategy for industrial ML requires four elements:

  • Source and timing identification — for each data point, know which sensor, which line, which shift, under what process conditions.
  • Contextual integration — link process data with operational metadata: product changeovers, maintenance events, operator shifts, environmental conditions.
  • Sampling protocol — define how, when, and by whom data is collected to minimise systematic bias before it enters the training pipeline.
  • Industrial standards exploration — evaluate OPC UA or equivalent frameworks for unified context data across your sensor and system landscape.

Want the next edition in your inbox?

Subscribe to the SAIKARIS newsletter — one operational topic, in depth, every week. Subscribe