Hand tracking implementation in AR app

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.
Development and support of all types of mobile applications:
Information and entertainment mobile applications
News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators
E-commerce mobile applications
Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.
Business process management mobile applications
CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems
Electronic services mobile applications
Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Showing 1 of 1 servicesAll 1735 services
Hand tracking implementation in AR app
Complex
~5 business days
FAQ
Our competencies:
Development stages
Latest works
  • image_mobile-applications_feedme_467_0.webp
    Development of a mobile application for FEEDME
    756
  • image_mobile-applications_xoomer_471_0.webp
    Development of a mobile application for XOOMER
    624
  • image_mobile-applications_rhl_428_0.webp
    Development of a mobile application for RHL
    1054
  • image_mobile-applications_zippy_411_0.webp
    Development of a mobile application for ZIPPY
    947
  • image_mobile-applications_affhome_429_0.webp
    Development of a mobile application for Affhome
    862
  • image_mobile-applications_flavors_409_0.webp
    Development of a mobile application for the FLAVORS company
    445

Implementing Hand Tracking in AR Applications

Hand tracking is finger and hand tracking without markers via camera. Independent AR interface control, virtual musical instruments, educational apps for surgery or mechanics, AR games where hands are the controller. Technically difficult task: 21 joints per hand, fast movements, finger overlap, tracking loss in poor lighting.

Platform Situation

iOS: ARKit before iOS 18 didn't provide public Hand Tracking API. Starting visionOS 1.0 and iOS 18 / ARKit 6HandAnchor with HandSkeleton available in RealityKit. Works on iPhone via rear camera. 26 joints per hand.

Before iOS 18 on iPhone — only third-party ML solutions. After iOS 18 — native ARKit.

Android: ARCore has no hand tracking. MediaPipe Hands — standard on Android (and iOS if cross-platform needed).

ARKit Hand Tracking (iOS 18+)

// iOS 18+, RealityKit
let session = ARKitSession()
let handTrackingProvider = HandTrackingProvider()

Task {
    try await session.run([handTrackingProvider])

    for await update in handTrackingProvider.anchorUpdates {
        let handAnchor = update.anchor
        guard handAnchor.isTracked else { continue }

        // Position of index finger tip
        if let indexTip = handAnchor.skeleton.joint(named: .indexFingerTip) {
            let worldTransform = handAnchor.originFromAnchorTransform * indexTip.anchorFromJointTransform
            // Attach object to finger tip
        }
    }
}

On visionOS same API, but with both hands simultaneously and without needing to hold device.

MediaPipe Hands: Cross-Platform Solution

MediaPipe Hand Landmark Task — 21 joints per hand, up to 2 hands simultaneously. iOS + Android. Free.

// Android
val handLandmarker = HandLandmarker.createFromOptions(context,
    HandLandmarkerOptions.builder()
        .setBaseOptions(BaseOptions.builder().setModelAssetPath("hand_landmarker.task").build())
        .setNumHands(2)
        .setMinHandDetectionConfidence(0.5f)
        .setMinTrackingConfidence(0.5f)
        .build()
)

val result = handLandmarker.detect(mpImage)
// result.landmarks() — List<List<NormalizedLandmark>>
// 21 points per hand in normalized coordinates [0..1]

21 joints in MediaPipe: WRIST, THUMB_CMC through THUMB_TIP, INDEX_FINGER_MCP through INDEX_FINGER_TIP, similarly for other 4 fingers.

For AR attachment in 3D: normalized 2D coordinates → unproject via camera intrinsics + depth (LiDAR or monocular depth estimation).

Gesture Recognition

Basic gestures without ML — via joint geometry:

Pinch: distance between THUMB_TIP and INDEX_FINGER_TIP < threshold (usually 2–3 cm in real coordinates).

Open palm: all _TIP joints higher than corresponding _MCP joints on Y axis.

Fist: all _TIP joints lower than _MCP on Y axis.

Victory (V-gesture): index and middle _TIP higher than _MCP, rest — lower.

Complex gestures (ASL alphabet, custom combinations) — CreateML GestureClassifier or TensorFlow Lite custom model. Training on 500–1000 samples per gesture.

Hand Interaction with AR Objects

Picking objects: ray from palm/finger → intersection with AR objects. Pinch gesture = "grab", release = "drop".

Deforming AR object with hands: two hands simultaneously → scaling (distance between palms), rotation (orientation of vector between palms).

Surgical simulation: finger tip interacts with virtual organs — collision detection between joint position and AR mesh. CollisionComponent + PhysicsBodyComponent in RealityKit for physically correct interaction.

Limitations in Real Conditions

Finger tracking degrades with occlusion (one finger behind another — common situation). MediaPipe and ARKit both use 2.5D approach — self-occlusion problem not fully solved.

Dark background + dark skin of hands — contrast decreases, detection confidence falls. Minimum lighting for stable tracking — 200 lux. Need UI indicator when confidence < 0.5.

Latency: MediaPipe on mid-range Android (Snapdragon 720G) — 35–45 ms per frame. ARKit HandTracking on iPhone 15 — 15–20 ms. For musical instruments difference is noticeable.

Timeline

Basic hand tracking with pinch/open gesture recognition on iOS 18+ (ARKit) — 1–2 weeks. Cross-platform solution on MediaPipe — 2–3 weeks. Custom gesture classifier with training — plus 2–3 weeks. Interactive hand interaction with AR objects (picking, deformation) — plus 2–4 weeks. Cost calculated individually.