Why AI architecture matters for optical inspection on the plant floor: CNNs and the challenge of local vision
Understanding how convolutional neural networks work helps engineers recognize real limits in defect detection and opens the door for new architectures.
Introduction
Industrial visual inspection often hears that AI can solve everything, but plant realities tell a more nuanced story. Most deployed models use convolutional neural networks (CNNs), like YOLO, which analyze images in small local regions to find defects. This works well for clear, isolated defects but struggles with complex or distributed patterns. Understanding CNN operations helps avoid blind trust and enables better testing and validation plans.
The Technical Concept
Industrial AI vision is built on machine learning (ML), where deep learning (DL) automatically extracts features. Within DL, CNNs are the go-to for image inspection and the backbone of models like YOLO.
CNNs apply multiple filters (kernels) over the input image. For example, a 640 x 640 pixel image processed with a 3x3 kernel and stride 2 is progressively downsampled: after three such layers, the effective internal feature map shrinks to 80 x 80 pixels, while depth increases with multiple channels representing distinct detected patterns.
Each kernel extracts local information — neighboring pixels — creating depth channels that highlight different features. This is like inspecting a tray divided into small compartments, each assessed independently.
Ultimately, the network merges these feature maps to decide if a defect exists, its type, and position. But prediction scope is mostly limited to local patterns within those compartments.
The Real Problem
The problem arises when defects aren't isolated points but show complex patterns spanning larger areas — for example, two small defects close together or shapes depending on overall image context. To catch these dispersed defects, CNNs must reduce resolution heavily, risking detail loss or mixing close defects.
Practical Implications
This means engineers can't rely solely on model architecture or vendor quality claims. It's crucial to review training data scope: volume, defect diversity, class balance, and especially the confusion matrix — summarizing what defects the model detects well versus those it confuses. This anticipates real plant-floor behavior.
Not all defect types or factory setups are alike, so this technical grounding shapes project success from the start. The next newsletter will explore Transformers — architectures tackling local vision limits with global attention — and their benefits for the plant floor.