AI System for Digital Actor Double Generation
Digital doubles solve several separate film production tasks: dangerous scenes without actor risk, continuing production when talent is unavailable, de-aging, working with characters of deceased actors (with legal basis). This is a complex multimodal task requiring integration of multiple technologies.
System Components
3D Digital Human Core:
- MetaHuman Creator (Unreal Engine) as base rig — industry standard with highest detail level
- Gaussian Splatting / NeRF for scanning real actor (photogrammetry + ML reconstruction)
- FLAME / SMPL-X parametric models for body and face
Motion Transfer:
- DensePose + SMPLify-X for transferring movement from reference video
- Face Reenactment: FOMM (First Order Motion Model), Face-Vid2Vid for 2D work
- Body Pose Transfer: Vid2Vid Synthesis, Neural Body
Appearance Transfer / Face Swap:
- ROOP, SimSwap, FaceSwap for complete face replacement
- DiffFace, IP-Adapter FaceID for high-quality diffusion results
- Preservation systems for moles, scars, identifying features
Rendering Pipeline:
- Real-time: Unreal Engine 5 MetaHuman + neural network super-resolution
- Offline: Nuke/Flame compositing + ML-based color/light matching
- Neural Rendering: NeRF-based for photorealistic static and limited motion
Legal and Ethical Framework
Mandatory requirement: written actor consent and clear understanding of application scope. System includes watermarking for tracking content origin. We do not undertake projects without necessary rights.
Development Pipeline
Weeks 1–4: Capture session with actor: photogrammetric scan (300+ photos), video recording of facial performance (neutral expressions, phonemes, emotions), body motion capture.
Weeks 5–10: 3D model building, rigging, training face reenactment model. Actor similarity validation.
Weeks 11–15: Production pipeline integration. Test scene. Director feedback corrections.
Weeks 16–18: Production pace optimization. VFX team training.
Technical Specifications
| Parameter | Value |
|---|---|
| Similarity to Original (FID) | <50 (high similarity) |
| Temporal Coherence | >0.95 |
| Processing: offline (4K) | 1–5 min/frame on A100 |
| Processing: real-time preview | 24 fps at 1080p (RTX 4090) |
| Facial Expression Support | 52 FACS blend shapes |
Limitations
"Uncanny valley" — persistent risk in high-fidelity work. We conduct mandatory blind review with unfamiliar viewers before final render. Extreme realism requires more iterations than stylization. Hand and finger movement — still the most difficult part.







