AI-Powered Fitness Trainer with Real-Time Form Correction
Difference between counting reps and correcting form—fundamental. Reps—"how many". Form—"how": knee bend angle in squat (should be ≥ 90°, not past toes), neutral lower back, scapula control. All require precise geometric pose analysis per frame and clear real-time feedback.
Pose Estimation: Accuracy Matters
For form correction, MediaPipe BlazePose Full (33 points including hands and feet) more accurate than Vision VNDetectHumanBodyPoseRequest (19 points). Critical difference: MediaPipe delivers 3D coordinates (x, y, z) in normalized space—allows calculating angles in real 3D, not just camera plane.
// MediaPipe Tasks iOS SDK
import MediaPipeTasksVision
class FormAnalyzer: PoseLandmarkerLiveStreamDelegate {
private var poseLandmarker: PoseLandmarker?
func setup() throws {
let options = PoseLandmarkerOptions()
options.baseOptions.modelAssetPath = Bundle.main.path(
forResource: "pose_landmarker_full",
ofType: "task"
)!
options.runningMode = .liveStream
options.numPoses = 1
options.minPoseDetectionConfidence = 0.7
options.minPosePresenceConfidence = 0.7
options.minTrackingConfidence = 0.7
options.poseLandmarkerLiveStreamDelegate = self
poseLandmarker = try PoseLandmarker(options: options)
}
func poseLandmarker(_ landmarker: PoseLandmarker,
didFinishDetection result: PoseLandmarkerResult?,
timestampInMilliseconds: Int,
error: Error?) {
guard let landmarks = result?.landmarks.first else { return }
analyzeSquatForm(landmarks: landmarks)
}
}
Geometric Analysis of Form: Squat Example
Squat—most analytically dissected exercise. Key metrics:
Knee Bend Angle
func kneeFlexionAngle(landmarks: [NormalizedLandmark]) -> Double {
// Points: hip (23/24), knee (25/26), ankle (27/28)
let hip = landmarks[23] // leftHip
let knee = landmarks[25] // leftKnee
let ankle = landmarks[27] // leftAnkle
// Vectors from knee to hip and from knee to ankle
let vecToHip = SIMD2<Double>(
Double(hip.x - knee.x),
Double(hip.y - knee.y)
)
let vecToAnkle = SIMD2<Double>(
Double(ankle.x - knee.x),
Double(ankle.y - knee.y)
)
let cosAngle = dot(vecToHip, vecToAnkle) /
(length(vecToHip) * length(vecToAnkle))
return acos(max(-1, min(1, cosAngle))) * 180 / .pi
}
Squat bottom norm: 80–100°. Less than 80°—too deep for beginners, over 100°—incomplete range. Feedback: "Go deeper—bend knees another 10–15°".
Knee Over Toe
func kneeOverToe(landmarks: [NormalizedLandmark]) -> Bool {
let knee = landmarks[25]
let toe = landmarks[31] // leftFootIndex
// If knee X (horizontal) significantly further than toe X on side view
return Double(knee.z - toe.z) > 0.05 // z in MediaPipe—depth
}
MediaPipe 3D coordinate z—depth from camera. On side view this gives forward-back projection. For front view—need side rig or 3D reconstruction from front data (harder).
Spine: Neutral Position
func spineAngle(landmarks: [NormalizedLandmark]) -> Double {
let shoulder = landmarks[11] // leftShoulder
let hip = landmarks[23] // leftHip
// Angle of shoulder-hip line to vertical
let dx = Double(shoulder.x - hip.x)
let dy = Double(shoulder.y - hip.y)
return atan2(dx, -dy) * 180 / .pi
}
30° from vertical at bottom = slouching / "spilling" torso forward. Correction: "Lift chest, squeeze shoulder blades".
Real-Time Voice Feedback
Text hints on screen inconvenient: user watches exercise, not phone. Voice hints work better.
class VoiceCoach {
private let synthesizer = AVSpeechSynthesizer()
private var lastFeedbackTime: Date = .distantPast
private let feedbackCooldown: TimeInterval = 3.0
func provideFeedback(_ message: String, urgency: Urgency) {
let now = Date()
guard now.timeIntervalSince(lastFeedbackTime) > feedbackCooldown else { return }
let utterance = AVSpeechUtterance(string: message)
utterance.voice = AVSpeechSynthesisVoice(language: "en-US")
utterance.rate = urgency == .critical ? 0.55 : 0.48
utterance.pitchMultiplier = urgency == .critical ? 1.1 : 1.0
utterance.volume = 0.9
synthesizer.speak(utterance)
lastFeedbackTime = now
}
}
feedbackCooldown = 3.0 critical. Without it system repeats same message 30 times/second. User disables sound.
Prioritization: multiple errors simultaneously → pick most critical. Hierarchy: safety (knee inward → injury risk) > form (incomplete range) > tip (breathe evenly).
Exercise Phase Analysis
Correction relevant only in right phase:
- Eccentric phase (lowering into squat): check back and knee position
- Bottom point: check bend angle, knee-over-toe position
- Concentric phase (standing): check user doesn't "fold"
Detect phase via tracking point derivative (hip). Y-coordinate: negative (lowering) = eccentric, minimum at bottom, positive = concentric.
Supported Exercises and Scaling
Each exercise needs own metric and rule set. Implement via protocol:
protocol ExerciseFormAnalyzer {
var exerciseType: ExerciseType { get }
func analyze(landmarks: [NormalizedLandmark], phase: ExercisePhase) -> [FormFeedback]
func detectPhase(landmarks: [NormalizedLandmark], history: [[NormalizedLandmark]]) -> ExercisePhase
}
Each exercise—separate conforming type. Easy to add new without core changes.
Starting set: squat, lunge, push-up, deadlift, plank, burpee. Covers most home workouts without equipment.
Development Process
Choose and integrate MediaPipe / Vision. Develop geometric metrics per exercise (with trainer/coach). Feedback prioritization and cooldown system. Voice hints via AVSpeechSynthesizer. UI: skeleton overlay, real-time metrics, post-session analysis. Test on people of different builds and heights.
Timeframe Estimates
AI trainer for 3–5 exercises with voice tips—2–4 weeks. Extended system with auto-exercise detection, post-session report, progress over time—5–8 weeks.







