AI Pedestrian and Cyclist Detection for Autonomous Transport

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
AI Pedestrian and Cyclist Detection for Autonomous Transport
Complex
~1-2 weeks
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    823
  • image_logo-aider_0.jpg
    AIDER company logo development
    762
  • image_crm_chasseurs_493_0.webp
    CRM development for Chasseurs
    848

Pedestrian and cyclist detection for autonomous systems

Vulnerable road users—pedestrians, cyclists, and scooter riders—are the leading cause of death in accidents involving autonomous systems. Detect failure here isn't just a report metric: it's a human life. Therefore, the requirements are an order of magnitude more stringent than for a standard CV: recall > 98% under all conditions, including nighttime, rain, and partial road closures.

Problems specific to VRU (Vulnerable Road Users)

A cyclist with a bike is an elongated object of irregular shape. A scooter rider 30 meters away occupies 15x40 pixels. A pedestrian behind a parked car is half visible. A 100 cm tall child 20 meters away is a 20x30 pixel bbox.

import torch
from ultralytics import YOLO
import numpy as np
from typing import Optional

class VRUDetector:
    def __init__(self, model_path: str, camera_params: dict):
        # YOLOv8l или RT-DETR-L для VRU: нужна высокая чувствительность
        self.model = YOLO(model_path)
        self.focal_length = camera_params['focal_length']
        self.sensor_height = camera_params['sensor_height']
        self.image_height_px = camera_params['image_height']

        # Жёсткие пороги для VRU
        self.conf_threshold = 0.3    # ниже, чем обычно — лучше лишний FP
        self.min_height_px = 20      # минимальный размер для обнаружения

        # Классы VRU
        self.vru_classes = {0: 'person', 1: 'bicycle', 3: 'motorcycle'}

    def detect(self, frame: np.ndarray,
                min_distance_m: float = 1.0,
                max_distance_m: float = 80.0) -> list[dict]:
        results = self.model(frame, conf=self.conf_threshold,
                              classes=list(self.vru_classes.keys()))
        vru_detections = []

        for box in results[0].boxes:
            x1, y1, x2, y2 = map(int, box.xyxy[0])
            h_px = y2 - y1
            cls_id = int(box.cls)

            if h_px < self.min_height_px:
                continue  # слишком маленький объект

            # Оценка дистанции по высоте bbox
            distance = self._estimate_distance(h_px, cls_id)

            if not (min_distance_m <= distance <= max_distance_m):
                continue

            vru_detections.append({
                'class': self.vru_classes[cls_id],
                'confidence': float(box.conf),
                'bbox': [x1, y1, x2, y2],
                'distance_m': distance,
                'height_px': h_px,
                'priority': 'HIGH' if cls_id == 0 else 'MEDIUM'
            })

        return sorted(vru_detections, key=lambda x: x['distance_m'])

    def _estimate_distance(self, height_px: int, cls_id: int) -> float:
        """Простая монокулярная оценка по пинхол-модели"""
        real_heights = {0: 1.75, 1: 1.05, 3: 1.10}  # метры
        real_h = real_heights.get(cls_id, 1.5)
        return (real_h * self.focal_length) / (height_px * self.sensor_height
                                                / self.image_height_px)

Night detection: a critical scenario

According to statistics, 76% of fatal pedestrian accidents occur at night. Standard RGB models lose 30–40% of their recall at illumination levels below 3 lux.

Solutions:

1. Thermal camera (FLIR Lepton, Bosch BTC): the human body at 37°C stands out clearly against the asphalt. Recall in complete darkness: 88–93%. The downside is the lack of texture, making it harder to distinguish between a bicycle and a scooter.

2. Near-IR camera (850 nm): Car headlights with an IR component illuminate a range of 60–80 m. YOLOv8, retrained on IR data (the FLIR ADAS dataset contains an IR channel), maintains a recall of 85–90% at night.

3. Fusion RGB + Heat: The best result, but more complex and expensive.

class NightVRUFusion:
    """Поздний fusion: объединяем детекции с RGB и тепловой камеры"""

    def fuse(self, rgb_dets: list, thermal_dets: list,
              iou_threshold: float = 0.3) -> list:
        all_dets = []
        used_thermal = set()

        for rgb in rgb_dets:
            best_thermal = None
            best_iou = 0.0

            for i, therm in enumerate(thermal_dets):
                iou = self._compute_iou(rgb['bbox'], therm['bbox'])
                if iou > best_iou and iou > iou_threshold:
                    best_iou = iou
                    best_thermal = i

            if best_thermal is not None:
                # Объединяем confidence
                fused = rgb.copy()
                fused['confidence'] = min(
                    1.0, rgb['confidence'] * 0.6 +
                    thermal_dets[best_thermal]['confidence'] * 0.7
                )
                fused['source'] = 'fusion'
                used_thermal.add(best_thermal)
                all_dets.append(fused)
            else:
                all_dets.append(rgb)

        # Детекции только из тепловой (объекты без RGB-эквивалента)
        for i, therm in enumerate(thermal_dets):
            if i not in used_thermal and therm['confidence'] > 0.5:
                all_dets.append(therm)

        return all_dets

VRU Detector Quality Metrics

Condition Recall target Precision target
Day, good visibility > 98% > 90%
Twilight > 95% > 85%
Night (IR headlights) > 88% > 78%
Average rain > 92% > 82%
Partial overlap (< 40%) > 94% > 83%

Evaluation on standard benchmarks: KITTI Pedestrian, CityPersons, EuroCity Persons (specialized for complex conditions).

Case Study: Industrial Forklift

An autonomous forklift in a 15,000 sq. m warehouse. The task: stop when a person appears within a 3-meter radius. We used YOLOv8n + TensorRT INT8 on a Jetson Orin NX: 18 ms latency. With a recall of 99.1% on a test set of 400 scenarios, there were no missed people. FAR: 2–3 false positives per shift (working tool of similar shape).

System type Term
Detector for a specific scenario 4–7 weeks
Complete VRU system with night detection 8–14 weeks
Fusion RGB+Heat Certified 4–8 months