← Back to Use Cases

Retail Edge Intelligence: Computer Vision at Scale

Preface: The modern retail store is a data factory. Hundreds of shoppers interact with thousands of products every hour. Traditional "Store Analytics" (footfall counters) are insufficient. We need comprehensive understanding: gaze detection, queue analysis, and real-time shrinkage (theft) alerts. This requires running Computer Vision (CV) on-premise.

1. Hardware Selection (Jetson vs. x86)

For a typical store with 16-32 cameras, we need high throughput (FPS) per dollar.

Device AI Performance Power Recommended Use
NVIDIA Jetson Orin Nano 40 TOPS 15W Small stores (1-4 cameras)
NVIDIA Jetson Orin NX 100 TOPS 25W Medium retail (8-16 cameras)
x86 Server + A2 GPU 45 TOPS (FP32) 200W Flagship stores (32+ cameras)

2. The DeepStream Pipeline

Decoding 16 streams of 1080p H.264 video will crush any CPU. We use NVIDIA DeepStream to keep the entire pipeline (Decode -> Pre-process -> Inference -> Tracker) on the GPU.

# deepstream_config.txt

[source0]
enable=1
# Type 4 = RTSP
type=4
uri=rtsp://camera-01.local:554/stream

[streammux]
gpu-id=0
batch-size=16
batched-push-timeout=40000
width=1920
height=1080

[primary-gie]
enable=1
model-engine-file=yolov8_int8.engine
labelfile-path=labels.txt
interval=0 # Process every frame (expensive) or skip (interval=2)

3. Training Custom Models (YOLOv8)

Off-the-shelf models detect "Person" and "Car". Retail needs "Holding Product", "Putting in Pocket", and "Staff Uniform". We fine-tune YOLOv8 on specific store datasets.

from ultralytics import YOLO

# Load a model
model = YOLO('yolov8n.pt')  # load a pretrained model (nano)

# Train the model
results = model.train(
    data='retail_theft_dataset.yaml', 
    epochs=100, 
    imgsz=640,
    device=0
)

4. TensorRT Optimization

PyTorch models are slow. We convert them to TensorRT engines, which perform graph fusion and kernel autotuning. We also use INT8 Calibration to reduce precision while maintaining accuracy.

Result: YOLOv8n jumps from 45 FPS (PyTorch) to 600 FPS (TensorRT INT8) on an Orin Nano.

5. The Event Bus (MQTT)

The edge node does not send video to the cloud. It sends metadata. We use an MQTT Probe in DeepStream to publish JSON events.

# MQTT Payload Example
{
  "sensorId": "cam-04",
  "timestamp": "2024-12-27T14:30:00Z",
  "objects": [
    {
      "class": "person", 
      "confidence": 0.92,
      "bbox": [100, 200, 50, 150],
      "attributes": {
         "action": "dwelling",
         "duration": 45s
      }
    }
  ]
}

This JSON stream is consumed by a local dashboard (Grafana) for the Store Manager and synced to the cloud for Head Office analytics.


Conclusion: By moving the eyes of the AI into the store, we turn "Loss Prevention" from a reactive investigation of yesterday's tapes into a proactive intervention in real-time.

See More, Sell More

We deploy turnkey computer vision agents for modern retail.

Retail@networkprogrammable.com