AI Image Generation Implementation in Mobile Applications
Image generation via Stable Diffusion, DALL·E 3, or Midjourney API — task where bottleneck isn't the algorithm but UX expectations and resource management. Cloud model request takes 5–30 seconds, on-device generation on mobile takes 10–60 seconds depending on model and device. User must understand what's happening this entire time.
Cloud Generation: DALL·E 3 and Stable Diffusion API
OpenAI Images API (POST /v1/images/generations) — simplest path. Request returns image URL or base64. Response time — 8–20 seconds for 1024×1024.
struct ImageGenerationRequest: Encodable {
let model: String // "dall-e-3"
let prompt: String
let n: Int // 1 (dall-e-3 doesn't support > 1)
let size: String // "1024x1024"
let quality: String // "standard" or "hd"
let responseFormat: String // "url" or "b64_json"
enum CodingKeys: String, CodingKey {
case model, prompt, n, size, quality
case responseFormat = "response_format"
}
}
Replicate API gives access to Stable Diffusion XL, FLUX, and other open-source models. Feature — async model: first request returns prediction ID, then need polling or webhook. On mobile, poll every 2 seconds with exponential backoff on errors:
suspend fun pollPrediction(predictionId: String): String {
var delay = 2000L
repeat(15) {
delay(delay)
val result = api.getPrediction(predictionId)
if (result.status == "succeeded") return result.output.first()
if (result.status == "failed") throw GenerationException(result.error)
delay = minOf(delay * 1.5, 8000L).toLong()
}
throw TimeoutException("Generation timed out")
}
On-Device Generation via Core ML
Apple ML Research released Stable Diffusion for Apple Silicon. On iPhone 15 Pro / M-series iPad — ~20 seconds for 512×512, 20 steps. On iPhone 12 — 60–90 seconds. Model weighs 2–6 GB depending on quantization.
import StableDiffusion
let pipeline = try StableDiffusionPipeline(
resourcesAt: modelDirectory,
controlNet: [],
configuration: .init()
)
pipeline.loadResources()
var config = StableDiffusionPipeline.Configuration(prompt: userPrompt)
config.stepCount = 20
config.guidanceScale = 7.5
config.seed = UInt32.random(in: 0...UInt32.max)
let images = try pipeline.generateImages(configuration: config) { progress in
DispatchQueue.main.async {
self.generationProgress = Double(progress.step) / Double(progress.stepCount)
}
return true // continue generation
}
Thermal throttling is real problem. After 3–4 sequential generations iPhone drops performance. Solution: pause between generations, monitor ProcessInfo.thermalState, warn user.
On Android, on-device Stable Diffusion works via MediaPipe with LlmInferenceSession or directly via ONNX Runtime with GPU delegate. Support significantly worse than Apple Silicon — recommend cloud-first for Android.
Generation UX
Progress bar with real value (not spinner) — critical for long operations. Stable Diffusion returns progress.step — use it. Show intermediate preview (latent-preview) from step 5 — holds attention.
Cancel generation: cloud request can be cancelled via URLSessionTask.cancel() or Replicate API POST /predictions/{id}/cancel. On-device — via flag in progress callback.
Save to gallery: PHPhotoLibrary.requestAuthorization(for: .addOnly) on iOS. WRITE_EXTERNAL_STORAGE permission (pre-Android 9) or MediaStore.Images API. Request permission only at first save, not on generation screen open.
Common Mistakes
Content policy violations — DALL·E 3 rejects prompts with violence, NSFW, celebrity content. Need prompt validation before sending (OpenAI Moderation API) and clear error message. Don't show "Your request was rejected" — explain what's forbidden.
Device memory: on-device Stable Diffusion needs 4–6 GB RAM at peak. os_proc_available_memory() on iOS shows available memory — if less than 1 GB free, fallback to cloud.
Implementation Process
Choose architecture (cloud / on-device / hybrid). Integrate chosen API with async pattern handling (polling / webhook). Generation UX: progress, preview, cancel. Storage and export. Content policy and network error handling. Test across device range — budget to flagship.
Timeline Guidelines
Cloud generation with basic UI — 4–6 days. On-device Stable Diffusion with latent-preview and thermal management — 2–3 weeks.







