Implementing Object Recognition and Tracking (Object Detection) in AR Applications
Object tracking is AR without markers. The system recognizes a real physical object by its 3D shape and keeps AR content attached to it as the camera and object move. This is more complex than image tracking, more demanding on hardware, and significantly more limited in catalog scope — but it's exactly what you need when marking an object with a sticker is impossible or undesirable.
ARKit Object Detection: Scanning and Recognition
The ARKit pipeline has two phases.
Phase 1: Scanning. ARObjectScanningConfiguration is a special configuration only for Xcode scanning utility (Reality Composer or Apple's demo app). User walks around object from all sides, ARKit builds point cloud. Result is .arobject file (~1–50 MB depending on detail).
Phase 2: Detection. ARWorldTrackingConfiguration with detectionObjects = [arObject]. On detection — renderer(_:didAdd:for:) with ARObjectAnchor. Anchor contains transform of object in world space.
Critical limitation: ARKit Object Detection works only on A12+ devices and requires sufficient object texture. Smooth monochrome objects (white plastic case, glass bottle) don't scan reliably. Features points need textural details, logos, markings.
What Tracks Well vs What Doesn't
| Good Candidates | Poor Candidates |
|---|---|
| Toys with drawings/details | Monochromatic plastic bodies |
| Household appliances with panels | Glass/transparent objects |
| Industrial equipment with markings | Polished metal surfaces |
| Packaged boxes | Soft deformable objects |
| Automotive parts | Objects without fixed shape |
Glass and mirror surfaces are fundamentally unsuitable for visual feature tracking. Only markers or LiDAR mesh matching work for them.
Vuforia Model Targets: Alternative Approach
ARKit Object Detection requires physical object scanning. Vuforia Model Targets is recognition from CAD model (STEP, OBJ, FBX) without physical scanning. This is a fundamental difference for industrial applications where CAD data already exists.
Vuforia Model Target Generator (desktop application) compiles database from CAD model. SDK on iOS/Android detects object by silhouette and feature map generated from CAD. Positioning accuracy up to 5–10 mm with good lighting.
Vuforia Engine license from $840/year. Model Targets available in Engine+ tier.
Object Tracking (Moving Object)
ARKit Object Detection fixes a static object. Tracking a moving object is a fundamentally different task.
ARKit 2021+ supports ARTrackedRaycast for dynamic binding, but for independently moving object (conveyor part, moving robot) you need a custom approach:
- MediaPipe Object Detection (COCO SSD, EfficientDet) — bounding box of object in 2D
- Depth estimation (LiDAR) → project 2D bounding box to 3D to get position
- AR content follows object with interpolation (lerp) for smoothness
This is significantly more complex and less accurate than Static Object Detection. Tracking accuracy degrades at movement speed > 0.5 m/s.
Practical Case
Service center application: technician points at car engine — ARKit recognizes specific engine model by .arobject, overlay shows schematic with node labels. Annotations are tied to specific points in object's coordinate space.
Complexity: engines in real service are dirty, partially occluded by hoses, covered with soot. Clean reference .arobject doesn't recognize dirty engine. Solution: scan several variants (clean / moderate dirt) and add all to detectionObjects. ARKit chooses best match.
Timeline
Basic object detection with one object + static annotations — 1–2 weeks (including scanning). Multiple objects, animated annotations, database integration — 3–5 weeks. Vuforia Model Targets instead of ARKit scan — similar timeline plus license. Cost calculated individually.







