Preface: The modern retail store is a data factory. Hundreds of shoppers interact with thousands of products every hour. Traditional "Store Analytics" (footfall counters) are insufficient. We need comprehensive understanding: gaze detection, queue analysis, and real-time shrinkage (theft) alerts. This requires running Computer Vision (CV) on-premise.
1. Hardware Selection (Jetson vs. x86)
For a typical store with 16-32 cameras, we need high throughput (FPS) per dollar.
| Device | AI Performance | Power | Recommended Use |
|---|---|---|---|
| NVIDIA Jetson Orin Nano | 40 TOPS | 15W | Small stores (1-4 cameras) |
| NVIDIA Jetson Orin NX | 100 TOPS | 25W | Medium retail (8-16 cameras) |
| x86 Server + A2 GPU | 45 TOPS (FP32) | 200W | Flagship stores (32+ cameras) |
2. The DeepStream Pipeline
Decoding 16 streams of 1080p H.264 video will crush any CPU. We use NVIDIA DeepStream to keep the entire pipeline (Decode -> Pre-process -> Inference -> Tracker) on the GPU.
# deepstream_config.txt
[source0]
enable=1
# Type 4 = RTSP
type=4
uri=rtsp://camera-01.local:554/stream
[streammux]
gpu-id=0
batch-size=16
batched-push-timeout=40000
width=1920
height=1080
[primary-gie]
enable=1
model-engine-file=yolov8_int8.engine
labelfile-path=labels.txt
interval=0 # Process every frame (expensive) or skip (interval=2)
3. Training Custom Models (YOLOv8)
Off-the-shelf models detect "Person" and "Car". Retail needs "Holding Product", "Putting in Pocket", and "Staff Uniform". We fine-tune YOLOv8 on specific store datasets.
from ultralytics import YOLO
# Load a model
model = YOLO('yolov8n.pt') # load a pretrained model (nano)
# Train the model
results = model.train(
data='retail_theft_dataset.yaml',
epochs=100,
imgsz=640,
device=0
)
4. TensorRT Optimization
PyTorch models are slow. We convert them to TensorRT engines, which perform graph fusion and kernel autotuning. We also use INT8 Calibration to reduce precision while maintaining accuracy.
Result: YOLOv8n jumps from 45 FPS (PyTorch) to 600 FPS (TensorRT INT8) on an Orin Nano.
5. The Event Bus (MQTT)
The edge node does not send video to the cloud. It sends metadata. We use an MQTT Probe in DeepStream to publish JSON events.
# MQTT Payload Example
{
"sensorId": "cam-04",
"timestamp": "2024-12-27T14:30:00Z",
"objects": [
{
"class": "person",
"confidence": 0.92,
"bbox": [100, 200, 50, 150],
"attributes": {
"action": "dwelling",
"duration": 45s
}
}
]
}
This JSON stream is consumed by a local dashboard (Grafana) for the Store Manager and synced to the cloud for Head Office analytics.
Conclusion: By moving the eyes of the AI into the store, we turn "Loss Prevention" from a reactive investigation of yesterday's tapes into a proactive intervention in real-time.