Developing AR Try-On for E-commerce Products
AR try-on (Augmented Reality try-on) allows users to virtually "put on" or "place" a product in their reality through the device's camera. This reduces return rates and increases conversion in categories where appearance is critical: glasses, cosmetics, furniture, shoes. Technically, AR in a browser is non-trivial: computer vision, 3D rendering, and WebAPI limitations all intersect.
Types of AR Try-On and Technology Stack
Face AR (face): glasses, sunglasses, masks, makeup, hats. Foundation — face landmark detection (468 points from MediaPipe Face Mesh). The most mature technology for browser-based AR.
Body AR: clothing and shoe try-on for the full body. Requires pose estimation (MediaPipe Pose / BlazePose). More complex due to fabric deformation and pose variability.
Room AR (spatial placement): furniture, decor, appliances in interiors. Based on plane detection — finding horizontal planes (floor, table) via SLAM or depth sensors.
Hand AR: rings, watches, bracelets. MediaPipe Hands (21 keypoints per hand).
Face AR — Technical Implementation
Most common use case — glasses try-on. Stack:
Camera → getUserMedia()
→ MediaPipe FaceMesh → 468 landmark points
→ Three.js / Babylon.js → 3D model of glasses (GLTF)
→ Align model to face points (nose, temples, ears)
→ Render over video stream
→ Display in <canvas>
MediaPipe FaceMesh works in browsers via WASM + WebGL. Performance: ~30 FPS on modern smartphones, ~60 FPS on desktop with GPU.
import { FaceMesh } from '@mediapipe/face_mesh';
import { Camera } from '@mediapipe/camera_utils';
const faceMesh = new FaceMesh({
locateFile: file => `https://cdn.jsdelivr.net/npm/@mediapipe/face_mesh/${file}`
});
faceMesh.setOptions({
maxNumFaces: 1,
refineLandmarks: true,
minDetectionConfidence: 0.7,
minTrackingConfidence: 0.7,
});
faceMesh.onResults(results => {
if (!results.multiFaceLandmarks.length) return;
const landmarks = results.multiFaceLandmarks[0];
// Key points for glasses:
const noseBridge = landmarks[6]; // bridge of nose
const leftTemple = landmarks[234]; // left temple
const rightTemple = landmarks[454]; // right temple
const leftEar = landmarks[93]; // left ear
const rightEar = landmarks[323]; // right ear
updateGlassesModel({ noseBridge, leftTemple, rightTemple, leftEar, rightEar });
});
Aligning the 3D model to face points:
- Calculate center, rotation angle, and scale from temple and bridge coordinates
- Apply transformations to 3D object:
position,rotation,scale - Render via Three.js over video stream (canvas overlay with
mix-blend-mode: multiplyor transparent background)
3D Models for AR
Requirements for GLTF models for AR try-on:
- Polycount: up to 10,000 triangles — otherwise real-time rendering lags
- Textures: 512×512 or 1024×1024, PBR materials (metalness/roughness)
- Scale in meters: physically accurate so scaling by face points gives realistic results
- LOD: two versions — detailed for desktop, simplified for mobile
For each product (each pair of glasses) — a separate GLTF file. For a catalog of 500 items — 500 models. This is the main operational complexity: content modeling, not widget development.
Alternative to full 3D: 2D layer overlay — a cut-out image overlaid on video with deformation by face points. Less realistic, but models are prepared in Photoshop rather than a 3D editor.
Cosmetics (Virtual Makeup)
For lipstick, blush, eyeshadow — a different approach: not a 3D model, but coloring of face regions by mask.
// Get lips mask from landmark points
const lipPoints = [61, 185, 40, 39, 37, 0, 267, 269, 270, 409, 291, ...];
const lipPath = lipPoints.map(i => landmarks[i]);
// Draw on canvas over video
ctx.beginPath();
lipPath.forEach((point, i) => {
const x = point.x * canvas.width;
const y = point.y * canvas.height;
i === 0 ? ctx.moveTo(x, y) : ctx.lineTo(x, y);
});
ctx.closePath();
ctx.globalAlpha = 0.6;
ctx.fillStyle = selectedColor; // selected lipstick color
ctx.fill();
Realism is enhanced through blending modes and lighting consideration (ambient light estimation from MediaPipe).
Room AR — Furniture and Interiors
To place furniture in interiors, you need plane detection — finding horizontal planes. In browsers this is possible via WebXR API (Chrome on Android with ARCore) and partially via heuristics (surface texture analysis through CV).
if (navigator.xr) {
const session = await navigator.xr.requestSession('immersive-ar', {
requiredFeatures: ['hit-test', 'local']
});
// ...
}
WebXR immersive-ar support: Chrome on Android (ARCore), Safari on iOS (ARKit via <model-viewer>). On desktop — not supported. This is a platform limitation, not a development one.
Alternative for iOS: AR Quick Look — native viewer. Just provide a USDZ file (Apple's AR format):
<a href="product.usdz" rel="ar">
<img src="ar-badge.png" alt="View in AR">
</a>
iOS Safari automatically opens USDZ in native AR mode via ARKit. This is the simplest way to give iOS users AR experience without WebXR.
<model-viewer> — Universal Component
Google's <model-viewer> — Web Component that combines 3D viewing and AR:
<script type="module" src="https://ajax.googleapis.com/ajax/libs/model-viewer/3.4.0/model-viewer.min.js"></script>
<model-viewer
src="chair.glb"
ios-src="chair.usdz"
ar
ar-modes="webxr scene-viewer quick-look"
camera-controls
auto-rotate
alt="Office chair"
style="width: 400px; height: 400px;"
>
<button slot="ar-button">View in your space</button>
</model-viewer>
ar-modes="webxr scene-viewer quick-look" — automatically selects the best available mode: WebXR on Android, Quick Look on iOS, Scene Viewer as fallback.
Ready-Made SaaS Solutions
Developing AR from scratch is expensive and time-consuming. For a quick start, consider:
- Banuba — Face AR SDK, has Web SDK
- Perfect Corp (YouCam) — virtual makeup, glasses try-on, ready widget
- Vertebrae / Zakeke — room AR for furniture and decor
-
Shopify AR — built-in GLTF/USDZ support via
<model-viewer>
SaaS reduces launch time from 3–4 months to 2–4 weeks, but requires ongoing subscription and vendor dependency.
Performance Metrics
- AR engagement rate: % of users who launched AR from product card
- Conversion of AR users vs. non-AR — key ROI metric
- Return rate in AR categories after implementation
- Session time on product page with AR
Typical case data: AR user conversion 20–40% higher, return rate 20–30% lower for glasses and cosmetics.
Timeline
-
Room AR via
<model-viewer>(GLTF + USDZ, AR Quick Look on iOS, WebXR on Android): 1–2 weeks (development), main time — preparing 3D models - Face AR for glasses (MediaPipe + Three.js, custom implementation): 6–10 weeks
- Cosmetics / makeup (pixel-level blending): 4–8 weeks
- Banuba/YouCam SDK integration: 2–4 weeks + SDK license cost
Preparing 3D content (modeling + conversion) — separate budget item, often larger than development itself.







