Implementing AI Avatar Generation from Photo in a Mobile App
Task sounds simple: user uploads selfie — app returns set of stylized avatars. In reality it's several related problems: model choice (Stable Diffusion + LoRA vs specialized APIs), generation queue management, acceptable wait time, and handling photo resolutions.
Architecture: why not generate on-device
Stable Diffusion 1.5 in FLOAT16 weighs ~2.5 GB. Apple ML Stable Diffusion Swift package lets run it on iPhone 14 Pro (Neural Engine + 6 GB RAM). 20 DDIM steps on 512×512 — about 8 seconds. On flagship only. iPhone 12 or any mid-budget Android — impossible.
For avatar generation in production app, sensible choice is server generation via specialized service. Options:
| Service | Approach | Time | Quality |
|---|---|---|---|
| Replicate (SDXL + IP-Adapter) | REST API | 15–40 sec | High |
| Fal.ai | REST + WebSocket | 5–15 sec | High |
| Leonardo.ai | REST API | 10–30 sec | Very high |
| Astria.ai | Fine-tune + generation | 10–30 min (fine-tune) + 15 sec | Maximum |
For "user-like" avatars, best results from IP-Adapter or InstantID — preserve facial features without full fine-tune LoRA. If maximum accuracy needed (like Lensa App) — Dreambooth LoRA with 10–20 user photos, but takes 10–20 minutes processing.
Mobile client: upload and wait flow
Generation takes time — user needs clear feedback. Async flow architecture:
// iOS: start generation and poll status
class AvatarGenerationService {
private let apiClient: APIClient
func generateAvatar(photo: UIImage, style: AvatarStyle) async throws -> [UIImage] {
// 1. Compress + upload photo
let photoData = photo.jpegData(compressionQuality: 0.85)!
let uploadURL = try await apiClient.uploadPhoto(data: photoData)
// 2. Start generation job
let jobId = try await apiClient.startGeneration(
photoURL: uploadURL,
style: style.rawValue,
count: 6
)
// 3. Poll with exponential backoff
return try await pollJobResult(jobId: jobId)
}
private func pollJobResult(jobId: String) async throws -> [UIImage] {
var delay: TimeInterval = 2.0
for _ in 0..<30 {
try await Task.sleep(nanoseconds: UInt64(delay * 1_000_000_000))
let status = try await apiClient.checkJob(id: jobId)
switch status.state {
case .completed: return try await downloadResults(urls: status.resultURLs)
case .failed: throw AvatarError.generationFailed(status.error)
case .pending, .processing: delay = min(delay * 1.5, 8.0)
}
}
throw AvatarError.timeout
}
}
On Android similarly via Kotlin Coroutines + kotlinx.coroutines.delay.
Photo preparation on client
Avatar quality directly depends on input photo. Validate before sending:
- Face detected (iOS:
VNDetectFaceRectanglesRequest, Android: ML KitFaceDetector) - Lighting acceptable — check average brightness via
CIAreaAverage - Minimum resolution 512×512
- One face in frame (if multiple — show warning)
Compress photo to 1024×1024 JPEG 85% before sending — excessive resolution doesn't improve result, just increases upload time and cost.
Caching and result gallery
Generated avatars must be stored. Don't regenerate on every open — expensive. iOS: save to FileManager with metadata in Core Data (style, date, jobId for traceability). Android — Room + FileProvider.
Important: if app backgrounded during generation — polling breaks. Solution: save jobId in UserDefaults / SharedPreferences, check incomplete tasks status on next app open.
Push notification on readiness
Waiting 20–40 seconds with app open — fine. But if user minimized app — need push. Server sends FCM/APNs notification on completion. Client — UNNotificationAction with deep link to avatar gallery.
Rights and privacy
Privacy Nutrition Labels on App Store require declaring photo collection. If photo sent to server — it's Photos data type, usage: App Functionality. Required: explicit user consent, retention policy (delete original photo after generation), don't share with third parties for training without consent.
On Android with targetSdk 33+ request READ_MEDIA_IMAGES instead of deprecated READ_EXTERNAL_STORAGE.
Timeline
Basic flow (photo upload, API call, polling, show results) — 3–5 days. With face validation, gallery, push notifications, multiple styles — 2–3 weeks. Cost depends on platform (iOS/Android/both) and chosen AI provider.







