Homepage ꄲ Technical Sharing ꄲ Machine Vision from a Frame Latency Perspective: Transmission Speed, Interface Selection, and Continuous Optimization

Machine Vision from a Frame Latency Perspective: Transmission Speed, Interface Selection, and Continuous Optimization

Created on：2026-06-18 09:31

Machine Vision from Frame Latency Perspective: Transmission Speed, Interface Selection and Continuous Optimization

1. Why Frame Latency Becomes a Core Metric

Modern machine vision is fully transitioning from "offline inspection" to "online real-time decision-making", which fundamentally changes the requirements for frame latency:

1.1 Continuous Increase in Production Line Speed

Era	Typical Production Line Speed	Allowable Frame Latency	Typical Applications
2010s	1-3 m/s	10-30 ms	Label inspection, counting
2018-2022	3-8 m/s	3-10 ms	Precision dimension measurement, defect detection
2023-2026	8-20 m/s	<3 ms	Semiconductor wafer inspection, high-speed sorting

When the production line speed reaches 10 m/s, 1 ms of latency means a positional offset of 10 mm. For semiconductor packaging (feature size at μm level), this directly determines the yield rate.

1.2 Latency Budget for AI Inference Closed Loop

After the introduction of deep learning inference, the end-to-end latency budget for a frame of data is significantly compressed:

┌─────────────┐   ┌─────────────┐   ┌─────────────┐   ┌───────────┐   ┌─────────────┐
│ Exposure &   │──▶│ Transmission│──▶│ Preprocessing│──▶│ AI Inference│──▶│ Decision &   │
│ Acquisition │   │ to Host     │   │             │   │           │   │ Execution   │
│  ~50 μs     │   │  ? ms       │   │  ~0.5 ms    │   │ 2-5 ms    │   │  ~0.1 ms    │
└─────────────┘   └─────────────┘   └─────────────┘   └───────────┘   └─────────────┘
                                                    Total Budget: <8 ms

The window left for "Transmission to Host" is compressed to 1-2 ms, which requires the interface to complete the delivery of a full frame of data within this extremely short time window.

1.3 Cascade Effect of Multi-Camera Synchronization

3C electronics and new energy battery production lines often deploy 8-16 cameras with synchronous triggering. The system frame rate depends on the slowest link:

System Frame Rate = min(Camera Frame Rate_i)  ,  i = 1..N
If 1 out of 16 cameras is backlogged due to insufficient interface bandwidth:
  → Frame latency accumulates for this camera
  → Synchronous triggering fails
  → The beat of the entire production line is forced to decrease

2. Decomposition of Frame Latency Sources

The complete latency chain of a frame of image from photons to decision:

T_total = T_exposure + T_readout + T_transmit + T_process + T_decide

Stage	Latency Magnitude	Optimizable	Bottleneck Factors
Exposure	1-100 μs	Limited by luminous flux	Light source brightness, sensor sensitivity
Sensor Readout	10 μs - 5 ms	Limited by sensor architecture	Global shutter vs Rolling shutter, ADC rate
Interface Transmission	0.1 - 50 ms	Highly Optimizable	Interface bandwidth, encoding overhead, cable length
Host Processing	0.5 - 5 ms	Partially Optimizable	CPU/GPU performance, DMA efficiency, driver latency
Decision Execution	0.05 - 1 ms	Fixed by hardware	Actuator response time

Interface transmission is the link with the largest optimization space in the latency chain, and it is also the core focus of this article.

2.1 Fine Decomposition of Transmission Latency

T_transmit = T_frame / Bandwidth_eff + T_protocol + T_driver + T_DMA

T_frame / Bandwidth_eff: Raw frame data volume ÷ Effective bandwidth (including encoding overhead)
T_protocol: Protocol layer overhead (header/trailer, ACK/NAK, flow control)
T_driver: Driver layer copy and scheduling (user mode/kernel mode switching)
T_DMA: PCIe DMA transfer latency

Taking a 4096×3072 Mono8 (12 MB) image as an example:

Interface	Nominal Bandwidth	Effective Bandwidth	T_transmit	Remarks
GigE Vision	125 MB/s	~110 MB/s	109 ms	Severely insufficient bandwidth
USB 3.0	500 MB/s	~350 MB/s	34 ms	High protocol overhead
Camera Link Full	850 MB/s	~830 MB/s	14.5 ms	Bandwidth bottleneck
10GigE	1.25 GB/s	~1.0 GB/s	12 ms	Still not fast enough
CXP-12 (4-link)	6.25 GB/s	~5.0 GB/s	2.4 ms	First choice for low latency
CoF 100G	12.5 GB/s	~11.9 GB/s	1.0 ms	Next-generation solution
CLHS SFP+ (4-cable)	6.0 GB/s	~5.8 GB/s	2.1 ms	Long-distance fiber optic

3. In-depth Comparison of Latency Characteristics of Various Interfaces

3.1 Protocol Layer Latency

GigE Vision:    IP/UDP/TCP encapsulation → Protocol stack processing ~50-200 μs
                + Congestion control/retransmission → Unpredictable jitter
USB3 Vision:    UASP protocol → Polling+ACK → ~20-50 μs
                + Host Controller scheduling latency
Camera Link:    Packetless protocol, pixel direct transmission → ~0 μs (pure parallel)
                But cannot transmit control commands/metadata
CoaXPress:      8B/10B encoding, packet protocol → ~1-5 μs
                Header SOP(4B) + Trailer EOP(8B) → Extremely low overhead
CLHS:           Packet protocol + Hardware CRC → ~2-5 μs

Key Insight: The latency of packet protocols does not come from the "packet" itself, but from the depth of the protocol stack. GigE Vision needs to go through the complete TCP/IP or UDP stack, while the protocol processing of CXP/CLHS is completely done in FPGA hardware, with extremely high latency determinism.

3.2 Latency Jitter — More Fatal Than Average Latency

            Average Latency    Maximum Latency    Jitter(σ)     Determinism
GigE Vision   109 ms           250 ms+           20-50 ms      Extremely poor (network congestion)
USB3 Vision    34 ms            80 ms            10-15 ms      Poor (bus contention)
Camera Link    14.5 ms          15 ms            <0.1 ms       Excellent (fixed clock)
CXP-12         2.4 ms           2.5 ms           <0.05 ms      Excellent (hardware determined)
CLHS           2.1 ms           2.2 ms           <0.05 ms      Excellent

In high-speed production lines, jitter determines the "safety margin" of the system. If the maximum latency is unpredictable, system designers must reserve latency budget for the worst-case scenario, which directly reduces the production line beat.

3.3 Impact of Transmission Distance on Latency

Interface	Maximum Distance	Additional Latency from Distance	Relay Requirement
Camera Link	10 m	None (electrical signal propagation)	Non-relayable
USB 3.0	5 m	None	Hub increases latency
CXP-12	40-100 m	<0.5 μs/km	Non-relayable
10GigE	100 m (copper)	~5 μs/km	Switch increases latency
CLHS	10 km+ (fiber)	~5 μs/km	Optoelectronic relay available
CoF	10-40 km (fiber)	~5 μs/km	Ethernet PHY relay

4. Layered Strategies for Continuous Optimization

4.1 Physical Layer Optimization

Increase single-channel rate:

CXP 1.0 → 2.0: 6.25 → 12.5 Gbps (2x improvement)
CXP v3.0 (planned): 25 Gbps (another 2x improvement, 8B/10B encoding, line rate 31.25 Gbps)
Cost: 20% overhead of 8B/10B encoding, but backward compatible with existing cameras

Increase number of channels:

CXP 1-link → 4-link → 8-link
CLHS 1-cable → 8-cable SFP+
Cost: Cable cost, FPGA resources, number of DMA channels

Switch to fiber optics:

CoaXPress over Fiber (CoF): Utilizes Ethernet PHY, 10G/25G/100G
CLHS F1/F2 fiber options
Advantages: Long distance, EMI resistance, high bandwidth

4.2 Protocol Layer Optimization

Reduce header/trailer overhead:

CXP Header: SOP (4B) + HDP (4B) = 8B
CXP Trailer: EOP (8B)
Total Overhead: 16B / Packet
For 4096×1 line (4096B):
  Overhead Ratio = 16 / (4096+16) = 0.39%  ← Negligible
For 64B small packets (IO control packets):
  Overhead Ratio = 16 / (64+16) = 20%  ← Need optimization

Optimize CRC processing (reduce critical path latency):

Separate CRC from EOP word into independent cycles to avoid decoder waiting
Hardware pipelined CRC calculation, parallel with data transmission
This is the key issue to be solved by the `pkt_align` module in CoF bridge

Hardware Offload:

Header parsing → FPGA state machine, no CPU involvement
CRC verification → Dedicated hardware, wire-speed processing
DMA descriptor management → Scatter-Gather DMA, reduce interrupt frequency

4.3 Driver and System Layer Optimization

Zero-Copy:

Traditional Path:  NIC → Kernel Buffer → User Space Copy → Application Processing
           T_driver ≈ 50-200 μs (12MB frame)
Zero-Copy:    NIC → DMA Direct to User Space → Application Processing
           T_driver ≈ 1-5 μs

Implementation methods:

VFIO / UIO user-space drivers
HugePages to reduce TLB miss
CPU Pinning to avoid context switching

DMA Optimization:

Scatter-Gather DMA: One-time descriptor, reduce interrupts
Prefetch: Overlap DMA transmission with CPU prefetching
Aligned allocation: 4K alignment of frame buffer to avoid cache line tearing

4.4 Architecture-Level Optimization

Pipeline Parallelism:

Frame N:    [Exposure] → [Transmission] → [Processing] → [Decision]
Frame N+1:           [Exposure] → [Transmission] → [Processing] → [Decision]
Frame N+2:                    [Exposure] → [Transmission] → [Processing] → [Decision]
Effective Frame Latency = max(T_exposure, T_transmit, T_process)
            Not sum

ROI (Region of Interest) Transmission:

Transmit only regions containing targets, reduce data volume
Requires camera-side support (triggered ROI or line-by-line ROI)
Data volume can be reduced by 50-90%

Multi-Channel Parallel Acquisition:

Multi-link parallel transmission, total bandwidth = single link × N
Requires Host FPGA to support multi-channel DMA and frame reassembly
This is the core advantage of CXP-12 4-link architecture

5. Decision Framework for Camera Interface Selection

5.1 Four-Dimensional Evaluation Matrix

	Bandwidth	Latency Determinism	Distance	Cost
Camera Link CL Full	★★★	★★★★★	★★	★★★
Camera Link HS SFP+	★★★★	★★★★★	★★★	★★
USB3 Vision	★★★	★★	★★	★★★★★
GigE Vision	★★	★	★★★★	★★★★
10GigE	★★★	★★	★★★★	★★★
CXP-12 4-link	★★★★★	★★★★★	★★★	★★★
CoF 25G/100G	★★★★★	★★★★★	★★★★★	★★
CLHS 4-cable	★★★★	★★★★★	★★★★	★★

5.2 Scenario-Based Selection Recommendations

Scenario A: Semiconductor Wafer Inspection (Speed > 5 m/s, Precision < 1 μm)

Core Requirements: Extremely low jitter, high bandwidth
First Choice: CXP-12 4-link or CoF
Reason: Hardware deterministic latency (σ < 50 ns), 6.25 GB/s bandwidth meets 4K-16K line scan
Pitfall: Avoid GigE/USB due to unpredictable jitter

Scenario B: 3C Electronics Assembly Inspection (8-16 Cameras Synchronous)

Core Requirements: Multi-camera synchronization, cost controllable
First Choice: CXP-12 (4 cameras × 4-link) or CLHS
Reason: Hardware trigger latency < 1 μs, unified management by FPGA Host
Pitfall: USB bus bandwidth sharing leads to synchronization failure

Scenario C: Logistics Sorting (Speed 3-5 m/s, Distance 50-100 m)

Core Requirements: Long distance, anti-interference
First Choice: CoF (CoaXPress over Fiber) or CLHS Fiber
Reason: Fiber transmission up to 10 km+, no EMI issues
Pitfall: Severe signal attenuation of copper cables over long distances

Scenario D: Intelligent Transportation/License Plate Recognition (Speed < 200 km/h, Distance < 50 m)

Core Requirements: Cost-effectiveness, easy deployment
First Choice: 10GigE or CXP-12 1-link
Reason: PoE power supply, standard network infrastructure
Pitfall: GigE Vision is acceptable when frame rate requirements are not high

Scenario E: Consumer Electronics Appearance Inspection (Speed 1-3 m/s, Cost-Sensitive)

Core Requirements: Cost priority
First Choice: USB3 Vision or GigE Vision
Reason: Standard PC is sufficient, no dedicated capture card required
Pitfall: Pay attention to USB bandwidth contention and GigE congestion issues

5.3 Selection Decision Tree

Frame Rate × Resolution × Bit Depth > 2 GB/s ?
  ├─ Yes → CXP-12 / CoF / CLHS
  │         ├─ Distance > 40m ? → CoF / CLHS Fiber
  │         └─ Distance < 40m ? → CXP-12
  └─ No → Frame Rate × Resolution × Bit Depth > 500 MB/s ?
           ├─ Yes → 10GigE / CXP-12 1-link / CLHS Single Cable
           │         ├─ Jitter Requirement < 1 ms ? → CXP-12 / CLHS
           │         └─ No Strict Jitter Requirement ? → 10GigE
           └─ No → Cost-Sensitive ?
                    ├─ Yes → USB3 / GigE
                    └─ No  → Select based on distance and ecosystem

6. Future Trends and Technical Outlook

6.1 Continuous Rise of Interface Bandwidth

2010: Camera Link Full     850 MB/s
2015: CXP-6 (1-link)       750 MB/s
2018: CXP-12 (4-link)      6.25 GB/s
2022: CLHS SFP+ (8-cable)  9.6 GB/s
2025: CoF 100G              12.5 GB/s
2027+: CXP v3.0 (25G Coaxial)  ~10 GB/s (4-link)

6.2 Ultimate Directions for Latency Optimization

Photonic Computing: Image preprocessing in optical domain, eliminate electro-optical-electrical conversion latency
Smart Camera: AI inference embedded in sensor, transmit inference results instead of raw images (1000x data volume reduction)
CXL Memory Expansion: Camera data directly written to Host memory pool, eliminate PCIe DMA latency
Deterministic Ethernet (TSN/802.1Qbv): Reduce GigE Vision latency jitter from 50 ms to < 100 μs

6.3 Software-Defined Latency Optimization

Traditional:  Camera → [Fixed Protocol] → Capture Card → [Fixed Driver] → Application
Future:  Camera → [Programmable FPGA] → [User-Defined Pipeline] → Zero-Copy Memory
           ↑ Header Truncation          ↑ On-Line Preprocessing        ↑ GPU Direct
           ↑ ROI Extraction             ↑ Format Conversion           ↑ Zero-Copy

7. Conclusion

The essence of frame latency optimization is a full-stack engineering problem:

Physical Layer determines the lower limit of latency (speed of light cannot be broken)
Protocol Layer determines the determinism of latency (hardware offload vs software stack)
Driver Layer determines the efficiency of latency (zero-copy vs multiple copies)
Architecture Layer determines the effective value of latency (pipeline parallelism vs serial)

In high-speed machine vision scenarios, the priority for interface selection should be:

Latency Determinism > Peak Bandwidth > Transmission Distance > Cost

Because a system with uncontrollable jitter, even with low average latency, must be designed with latency budget for the worst-case scenario, ultimately dragging down the beat of the entire production line. CXP-12 and CoF have become the first choice for high-end vision precisely because they are the current optimal solutions in the dimension of latency determinism.

ꄴPrevious： null

ꄲNext： null