Visual Positioning System (VPS) Implementation in Mobile AR Applications
GPS provides 3–5 meter accuracy. For AR experiences in urban environments, this is catastrophically insufficient: overlaying navigation arrows over a specific building entrance or showing AR annotations above the right sculpture in a museum courtyard with 5-meter error is impossible. VPS solves this: user points camera at surroundings, algorithm matches frame against pre-recorded visual map and returns position with 10–30 centimeter accuracy.
How VPS Works
Two stages: mapping (offline) and localization (online, real-time).
Mapping. Space or outdoor zone is scanned: operator with phone or specialized rig walks entire area, recording video. Key frames extracted from video, Structure from Motion (SfM) algorithm runs—builds sparse point cloud and set of 6DOF camera poses. Feature point descriptors (ORB, SuperPoint + SuperGlue for better precision) indexed in database for fast lookup.
Localization. Phone records frame, sends to server (or processes locally on powerful devices). Image retrieval algorithm finds nearest key frames from database → PnP (Perspective-n-Point) computes camera pose → returns 6DOF transform. All within 200–500 ms with server processing.
Providers and SDKs
| Provider | Coverage | Offline | Accuracy |
|---|---|---|---|
| Google ARCore Geospatial API | Cities with Street View | No | ~10–30 cm |
| Immersal SDK | Custom maps | Yes (device) | ~2–5 cm |
| Niantic Lightship VPS | Lightship wayspots | No | ~10–20 cm |
| Apple ARKit + GPS | Outdoor, iOS only | Partial | ~1–3 m |
| Microsoft Azure Spatial Anchors | Custom | No | ~1–5 cm |
For custom closed spaces (museum, office, warehouse)—Immersal. For urban AR experiences on iOS—ARCore Geospatial API or Niantic VPS. Azure Spatial Anchors good with Azure infrastructure integration.
Integrating Immersal into Native App
Immersal provides REST API for cloud localization and Unity SDK—latter unneeded, work natively.
iOS: HTTP request to https://api.immersal.com/localize with JPEG frame and camera intrinsics → JSON response with pose in map coordinate system → convert to ARKit world space via matrix transformation.
struct LocalizeRequest: Encodable {
let token: String
let fx, fy, ox, oy: Double // camera intrinsics from ARCamera.intrinsics
let image: String // base64 JPEG
}
// Get mapToWorld matrix, apply to ARSession.currentFrame
Android: similarly via retrofit2 + moshi, camera intrinsics from CameraCharacteristics.
Run localization not on every frame (that's 200 ms latency)—trigger on 1+ meter position change or ARCore tracking loss.
Building Visual Map
Immersal scanning: specialized Immersal Mapper iOS app, or custom script via REST API. Filming requirements: frame overlap 60%+, uniform lighting, full coverage of user routes. For large space (3-floor mall)—several hours operator work.
After uploading video to Immersal: SfM processing 30–120 minutes, get mapId. On interior update—rescan changed zones, merge with existing map via Immersal Console.
Timeline
ARCore Geospatial integration for urban AR experience: 2–4 weeks. Custom VPS with Immersal for closed space including scanning: 4–8 weeks. Own VPS server based on HLoc (hloc + SuperPoint + SuperGlue + Colmap) without external dependencies: 3–5 months. Cost calculated individually.







