Pose Estimation Implementation in Mobile Applications
Pose estimation detects 17–33 key skeleton points of a person in real-time. Used in fitness apps (rep counting, form assessment), medical apps (rehabilitation, gait analysis), and AR products. More complex than object detection: requires not just point localization accuracy but stability between frames.
Model Selection: MoveNet vs BlazePose vs ML Kit Pose Detection
MoveNet Lightning/Thunder (Google)—best accuracy/speed balance for mobile. Lightning: 17 points, 30+ FPS on iPhone 12, TFLite-optimized. Thunder: more accurate, ~15 FPS. Available via TFLite Task Library as PoseLandmarker.
MediaPipe BlazePose / Pose Landmarker—33 points (including face, leg, hand points). Needed when detail matters: wrist angle, finger position. Latency: ~25 ms on Pixel 7 GPU, ~55 ms on CPU.
ML Kit Pose Detection—33 points, simple Firebase integration. In-process, no network. Slightly slower than MoveNet, easier for Android/iOS cross-platform.
For fitness rep tracking, MoveNet Lightning suffices. For medical gait analysis, BlazePose with 33 points and z-coordinates.
Rep Counting: More Complex Than It Seems
Typical task: "count squats." Naïve approach: track hip Y-coordinate, count threshold crossings. Works in perfect conditions, breaks in reality.
Correct approach: compute knee joint angle via dot product of vectors [HIP → KNEE] and [KNEE → ANKLE]. Squat = angle drops below 120°. Standing = angle returns above 160°. State machine: STANDING → DOWN → STANDING = 1 rep.
3D angle via z-coordinates is more stable than 2D if camera isn't strictly side-view. BlazePose/MoveNet return z in normalized units—this is depth estimation, not absolute meters. Sufficient for angles, insufficient for absolute distances.
Smooth landmarks—mandatory. Raw data jitters 3–5 pixels between frames. Simplest filter: exponential moving average EMA(α=0.6) per coordinate. MediaPipe provides VelocityFilter—better for non-linear motion.
Integration: iOS and Android
iOS: MediaPipe Tasks Vision via Swift Package Manager. PoseLandmarker with runningMode = .liveStream. Callback returns PoseLandmarkerResult with landmarks (normalized) and worldLandmarks (meters, for physical calculations). Draw skeleton on AVCaptureVideoPreviewLayer via CAShapeLayer—not UIKit UIView (heavier).
Android: com.google.mediapipe:tasks-vision. PoseLandmarker.create() with BaseOptions.GPU_DELEGATE on supported devices. Draw via Canvas.drawLine() on SurfaceView.
Case: knee injury rehabilitation app. Need knee bend angle assessment during exercises. Used MoveNet Thunder (more accurate than Lightning for small movements). Angle via dot product of 2D vectors (z unnecessary, camera strictly side-view—task constraint). Measurement error vs physician goniometer: ±4.2°. Sufficient for progress tracking, not medical diagnosis—important to communicate in UI.
Draw skeleton via CAShapeLayer with CABasicAnimation on path—smoother than redrawing in drawRect each frame.
Common Mistakes
Drawing skeleton in model coordinates without transforming to preview coordinates. MoveNet points are normalized (0..1), preview coordinates depend on gravity of AVCaptureVideoPreviewLayer. Need explicit transformation accounting for aspect ratio and crop.
Running inference on main thread—guaranteed 0 FPS UI. Inference on DispatchQueue.global(qos: .userInteractive) (iOS) or Executors.newSingleThreadExecutor() (Android).
Timeline
Basic skeleton on video stream + counting one exercise: 1–2 weeks. Full fitness module with multiple exercises, voice feedback, and history: 3–4 weeks. Cost calculated individually.







