Implementing Body Tracking in AR Applications
Body tracking is real-time human pose tracking with AR content tied to body joints. Used in fitness apps (exercise technique analysis), AR clothing try-on, interactive AR games where character mimics user movements. Technically more complex than face tracking: body farther from camera, worse lighting, occluded by clothing and background.
ARKit Body Tracking: Architecture
ARBodyTrackingConfiguration — iOS 13+, A12+. Tracks one person in frame (multi-person support experimental and unstable).
Returns ARBodyAnchor with skeleton of type ARSkeleton3D. Skeleton contains 91 joints in hierarchy. Key ones: root, hips, spine_1 through spine_7, left_shoulder_1_joint, left_arm_joint, left_forearm_joint, left_hand_joint, analogously right side, legs, head.
func session(_ session: ARSession, didUpdate anchors: [ARAnchor]) {
guard let bodyAnchor = anchors.first as? ARBodyAnchor else { return }
let skeleton = bodyAnchor.skeleton
// Right wrist position in world coordinates
if let wristTransform = skeleton.modelTransform(for: .rightHand) {
let worldTransform = bodyAnchor.transform * wristTransform
// Attach object to wrist
}
}
modelTransform(for:) returns transform relative to root (hips). For world coordinates multiply by bodyAnchor.transform.
Limitations and Practice
Distance: confident tracking — 1.5 to 5 meters. Below 1 meter body exits frame. Above 5 meters — joint instability, especially hands and feet.
Partial visibility: when part of body outside frame, ARKit extrapolates position of "invisible" joints by kinematic model. Accuracy acceptable for trunk and legs, worse for arms.
Movement speed: at fast movements (jumps, boxing punches) hand tracking lags 2–4 frames. For fitness coaching this is problem — punch speed measurement will be understated.
One person: if two in frame — ARKit tracks one (usually first detected). ARBodyAnchor — only one. For multi-person tracking need third-party ML.
Attaching 3D Character to Skeleton
Ready USDZ scene with rigged character. Rig must match ARKit joint hierarchy — 91 joints with same names (left_arm_joint, right_hand_joint, etc.). Blender: create armature with ARKit names, export to USDZ.
RealityKit BodyTrackedEntity — special class automatically applying ARKit skeleton transforms to compatible USDZ:
var character: BodyTrackedEntity?
func loadCharacter() async {
character = try? await Entity.load(named: "robot.usdz") as? BodyTrackedEntity
let bodyAnchor = AnchorEntity(.body)
bodyAnchor.addChild(character!)
arView.scene.addAnchor(bodyAnchor)
}
If joint names in USDZ don't match ARKit — character "falls apart". Verify via Reality Composer Pro: Joint mapping inspector.
MediaPipe Pose for Android and Cross-Platform
ARCore doesn't provide body tracking. For Android — MediaPipe Pose Landmarker:
- 33 key points (BlazePose GHUM)
- Works on RGB camera, no depth sensor
-
PoseLandmarker.detect()→PoseLandmarkerResultwith 3D coordinates in normalized space
For attaching 3D content in ARCore: convert 2D landmark coordinates + depth (if LiDAR/ToF) to AR world coordinates via Frame.hitTest() or depth API.
MediaPipe accuracy worse than ARKit: ~5–10 cm joint position error vs ~1–3 cm on ARKit with Neural Engine. For fitness analytics — sufficient. For precise skeletal rigging of 3D character — noticeable difference.
Motion Analysis for Fitness
Joint angle via dot product of two vectors:
func jointAngle(joint1: simd_float3, vertex: simd_float3, joint2: simd_float3) -> Float {
let v1 = normalize(joint1 - vertex)
let v2 = normalize(joint2 - vertex)
return acos(dot(v1, v2)) * (180 / .pi)
}
Knee angle during squat: jointAngle(hip, knee, ankle). Correct range — 80–110°. If < 60° — user squats too deep. Foundation for real-time form coaching.
Timeline
Basic body tracking with 3D object tied to joint — 1–2 weeks. Rigging 3D character to ARKit skeleton + integration — 3–4 weeks. Fitness analytics with joint angle analysis and rep counter — 4–6 weeks. Android MediaPipe version — additionally 2–3 weeks. Cost calculated individually.







