Pedestrian and cyclist detection for autonomous systems
Vulnerable road users—pedestrians, cyclists, and scooter riders—are the leading cause of death in accidents involving autonomous systems. Detect failure here isn't just a report metric: it's a human life. Therefore, the requirements are an order of magnitude more stringent than for a standard CV: recall > 98% under all conditions, including nighttime, rain, and partial road closures.
Problems specific to VRU (Vulnerable Road Users)
A cyclist with a bike is an elongated object of irregular shape. A scooter rider 30 meters away occupies 15x40 pixels. A pedestrian behind a parked car is half visible. A 100 cm tall child 20 meters away is a 20x30 pixel bbox.
import torch
from ultralytics import YOLO
import numpy as np
from typing import Optional
class VRUDetector:
def __init__(self, model_path: str, camera_params: dict):
# YOLOv8l или RT-DETR-L для VRU: нужна высокая чувствительность
self.model = YOLO(model_path)
self.focal_length = camera_params['focal_length']
self.sensor_height = camera_params['sensor_height']
self.image_height_px = camera_params['image_height']
# Жёсткие пороги для VRU
self.conf_threshold = 0.3 # ниже, чем обычно — лучше лишний FP
self.min_height_px = 20 # минимальный размер для обнаружения
# Классы VRU
self.vru_classes = {0: 'person', 1: 'bicycle', 3: 'motorcycle'}
def detect(self, frame: np.ndarray,
min_distance_m: float = 1.0,
max_distance_m: float = 80.0) -> list[dict]:
results = self.model(frame, conf=self.conf_threshold,
classes=list(self.vru_classes.keys()))
vru_detections = []
for box in results[0].boxes:
x1, y1, x2, y2 = map(int, box.xyxy[0])
h_px = y2 - y1
cls_id = int(box.cls)
if h_px < self.min_height_px:
continue # слишком маленький объект
# Оценка дистанции по высоте bbox
distance = self._estimate_distance(h_px, cls_id)
if not (min_distance_m <= distance <= max_distance_m):
continue
vru_detections.append({
'class': self.vru_classes[cls_id],
'confidence': float(box.conf),
'bbox': [x1, y1, x2, y2],
'distance_m': distance,
'height_px': h_px,
'priority': 'HIGH' if cls_id == 0 else 'MEDIUM'
})
return sorted(vru_detections, key=lambda x: x['distance_m'])
def _estimate_distance(self, height_px: int, cls_id: int) -> float:
"""Простая монокулярная оценка по пинхол-модели"""
real_heights = {0: 1.75, 1: 1.05, 3: 1.10} # метры
real_h = real_heights.get(cls_id, 1.5)
return (real_h * self.focal_length) / (height_px * self.sensor_height
/ self.image_height_px)
Night detection: a critical scenario
According to statistics, 76% of fatal pedestrian accidents occur at night. Standard RGB models lose 30–40% of their recall at illumination levels below 3 lux.
Solutions:
1. Thermal camera (FLIR Lepton, Bosch BTC): the human body at 37°C stands out clearly against the asphalt. Recall in complete darkness: 88–93%. The downside is the lack of texture, making it harder to distinguish between a bicycle and a scooter.
2. Near-IR camera (850 nm): Car headlights with an IR component illuminate a range of 60–80 m. YOLOv8, retrained on IR data (the FLIR ADAS dataset contains an IR channel), maintains a recall of 85–90% at night.
3. Fusion RGB + Heat: The best result, but more complex and expensive.
class NightVRUFusion:
"""Поздний fusion: объединяем детекции с RGB и тепловой камеры"""
def fuse(self, rgb_dets: list, thermal_dets: list,
iou_threshold: float = 0.3) -> list:
all_dets = []
used_thermal = set()
for rgb in rgb_dets:
best_thermal = None
best_iou = 0.0
for i, therm in enumerate(thermal_dets):
iou = self._compute_iou(rgb['bbox'], therm['bbox'])
if iou > best_iou and iou > iou_threshold:
best_iou = iou
best_thermal = i
if best_thermal is not None:
# Объединяем confidence
fused = rgb.copy()
fused['confidence'] = min(
1.0, rgb['confidence'] * 0.6 +
thermal_dets[best_thermal]['confidence'] * 0.7
)
fused['source'] = 'fusion'
used_thermal.add(best_thermal)
all_dets.append(fused)
else:
all_dets.append(rgb)
# Детекции только из тепловой (объекты без RGB-эквивалента)
for i, therm in enumerate(thermal_dets):
if i not in used_thermal and therm['confidence'] > 0.5:
all_dets.append(therm)
return all_dets
VRU Detector Quality Metrics
| Condition | Recall target | Precision target |
|---|---|---|
| Day, good visibility | > 98% | > 90% |
| Twilight | > 95% | > 85% |
| Night (IR headlights) | > 88% | > 78% |
| Average rain | > 92% | > 82% |
| Partial overlap (< 40%) | > 94% | > 83% |
Evaluation on standard benchmarks: KITTI Pedestrian, CityPersons, EuroCity Persons (specialized for complex conditions).
Case Study: Industrial Forklift
An autonomous forklift in a 15,000 sq. m warehouse. The task: stop when a person appears within a 3-meter radius. We used YOLOv8n + TensorRT INT8 on a Jetson Orin NX: 18 ms latency. With a recall of 99.1% on a test set of 400 scenarios, there were no missed people. FAR: 2–3 false positives per shift (working tool of similar shape).
| System type | Term |
|---|---|
| Detector for a specific scenario | 4–7 weeks |
| Complete VRU system with night detection | 8–14 weeks |
| Fusion RGB+Heat Certified | 4–8 months |







