AI perception and planning system for autonomous driving
Perception + Planning is a combination that converts sensor data into vehicle control commands. Perception answers the question "what's around me," while Planning answers the question "where and how to go." The gap between the lab demo and road testing is catastrophic: a system with mAP 0.92 on nuScenes can fail at the first unfamiliar intersection in the rain.
Sensor stack and fusion
Autonomous L3+ systems operate with several types of sensors simultaneously:
import numpy as np
import torch
from mmdet3d.models import build_detector
from mmdet3d.apis import inference_detector
class PerceptionPipeline:
def __init__(self, config: dict):
# 3D детектор: BEVFusion или SECOND
self.detector_3d = build_detector(config['detector_cfg'])
self.detector_3d.load_checkpoint(config['checkpoint'])
# 2D детектор камер: YOLOv8 или DETR
self.cam_detector = torch.hub.load('ultralytics/ultralytics',
'yolov8l', pretrained=True)
# Матрицы проекции LiDAR → камера
self.lidar2cam = np.array(config['lidar2cam_matrix'])
self.camera_intrinsics = np.array(config['cam_intrinsics'])
def fuse_lidar_camera(self, point_cloud: np.ndarray,
images: list[np.ndarray]) -> dict:
"""
LiDAR даёт точные 3D-координаты и дальность,
камера даёт семантику (тип объекта, цвет светофора).
BEVFusion объединяет в единое Bird's Eye View представление.
"""
bev_features = self._to_bev(point_cloud)
cam_features = [self.cam_detector(img) for img in images]
# Проекция LiDAR точек на плоскость камеры
pts_3d_cam = self._project_lidar_to_cam(point_cloud)
return {
'bev_features': bev_features,
'cam_detections': cam_features,
'projected_points': pts_3d_cam
}
3D object detection models
| Model | mAP nuScenes | Latency (A100) | LiDAR | Camera |
|---|---|---|---|---|
| SECOND | 62.1 | 40ms | Yes | No |
| CenterPoint | 65.5 | 55ms | Yes | No |
| BEVFusion (MIT) | 70.2 | 130ms | Yes | Yes |
| BEVFormer v2 | 72.8 | 180ms | No | Yes (multi-cam) |
| UniAD | 75.3 | 350ms | Yes | Yes |
The choice of model depends on the latency budget: for the onboard NVIDIA Drive Orin (128 TOPS), it is realistic to keep BEVFusion at 100ms with TensorRT optimization.
Planning: From Perception to Trajectory
class MotionPlanner:
def __init__(self, config: dict):
self.dt = 0.1 # шаг времени 100ms
self.horizon = 5.0 # горизонт планирования 5 сек
self.safety_margin = 0.8 # метров
def plan_trajectory(self, ego_state: dict,
detected_objects: list[dict],
hd_map: dict) -> np.ndarray:
"""
IDM (Intelligent Driver Model) + potential fields.
Для сложных сценариев: RL или трансформер (PDM-Closed).
"""
# Candidate trajectories из генератора
candidates = self._generate_candidates(ego_state)
# Оценка безопасности каждой траектории
scores = []
for traj in candidates:
collision_risk = self._collision_check(traj, detected_objects)
lane_keep = self._lane_keep_cost(traj, hd_map)
comfort = self._comfort_cost(traj)
total_cost = (3.0 * collision_risk +
1.5 * lane_keep +
0.5 * comfort)
scores.append(total_cost)
best_idx = np.argmin(scores)
return candidates[best_idx]
def _collision_check(self, trajectory: np.ndarray,
objects: list[dict]) -> float:
"""TTC (Time-To-Collision) для каждого объекта"""
min_ttc = float('inf')
for obj in objects:
ttc = self._compute_ttc(trajectory, obj)
min_ttc = min(min_ttc, ttc)
# TTC < 2 сек = высокий риск
return 1.0 / max(min_ttc, 0.1)
Testing: Simulation vs. Real Data
The gap between simulation (CARLA, SUMO) and reality is one of the main problems. A model trained solely on CARLA loses 15–25% of its mAP on real data due to domain gaps (lighting, textures, agent behavior).
Domain gap reduction strategy:
- Domain randomization: random textures, lighting, weather in the simulator
- Real2Sim: Transfer real-world scenes to CARLA via NeRF or Gaussian Splatting
- Curriculum learning: first simple scenes, then corner cases
Key components of the system
- Localization: HDMap + LiDAR SLAM (LOAM, LIO-SAM) with an accuracy of 10–20 cm
- Prediction: forecast of agent trajectories (HiVT, MTR) for 5 seconds
- Map: HD Map (Lanelet2) with markings, signs, rules
- Safety monitor: independent watchdog, checks the physical feasibility of commands
| Level of autonomy | Scope | Term |
|---|---|---|
| L2 assistant (ADAS) | The track is in good conditions. | 4–8 months |
| L3 pilot | Structured environment | 10–18 months |
| L4 robo-taxi (geofence) | A specific area | 24+ months |







