Implementing AI Image Generation (DALL-E) in a Mobile App
DALL-E 3 via OpenAI API is the simplest image generation provider to integrate. One POST request, URL in response, image download. The complexity lies not in the API itself, but in the UX: generation takes 5–15 seconds, and this time must be used wisely. Plus, prompt engineering directly impacts result quality.
API: basic integration
struct DALLERequest: Codable {
let model: String
let prompt: String
let n: Int
let size: String
let quality: String
let style: String
let responseFormat: String
enum CodingKeys: String, CodingKey {
case model, prompt, n, size, quality, style
case responseFormat = "response_format"
}
}
func generate(prompt: String) async throws -> URL {
let request = DALLERequest(
model: "dall-e-3",
prompt: prompt,
n: 1,
size: "1024x1024",
quality: "standard",
style: "vivid",
responseFormat: "url"
)
var urlRequest = URLRequest(url: URL(string: "https://api.openai.com/v1/images/generations")!)
urlRequest.httpMethod = "POST"
urlRequest.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization")
urlRequest.setValue("application/json", forHTTPHeaderField: "Content-Type")
urlRequest.httpBody = try JSONEncoder().encode(request)
let (data, _) = try await URLSession.shared.data(for: urlRequest)
let response = try JSONDecoder().decode(ImageGenerationResponse.self, from: data)
return URL(string: response.data[0].url)!
}
DALL-E 3 parameters:
-
size:1024x1024,1792x1024(landscape),1024x1792(portrait) -
quality:standard(faster, cheaper) orhd(more detailed) -
style:vivid(saturated, contrasty) ornatural(photorealistic) -
response_format:url(link valid for 60 minutes) orb64_json(base64 immediately)
DALL-E 3 does not support n > 1 — only one image per request.
UX during generation
5–15 seconds of waiting without feedback is poor UX. Options:
Animated placeholder — skeleton or animated gradient in place of the future image. User understands something is happening.
Progress bar with stages — can't show real progress (API doesn't stream), but can show animated pseudo-progress: "Analyzing request → Generating → Finalizing". Visually effective.
Prompt preview — while generation runs, display the revised prompt from DALL-E. The API returns revised_prompt — the actual description the model used. This interests users and fills the wait time.
// revised_prompt comes in the response
let revisedPrompt = response.data[0].revisedPrompt
// Show it in UI while image is downloading
promptLabel.text = revisedPrompt
Caching and storage
Generated images must be saved. URLs from the response last 60 minutes — after that, 404. Download and cache immediately after generation.
// Android: download and save via Coil
suspend fun downloadAndCache(imageUrl: String, localKey: String): File {
val request = ImageRequest.Builder(context)
.data(imageUrl)
.diskCacheKey(localKey)
.build()
val result = imageLoader.execute(request)
// Coil automatically saves to disk cache
return File(context.cacheDir, "dalle_${localKey}.jpg")
}
For long-term storage — save to MediaStore (Android) or Photos (iOS) on user request. Automatic saving without explicit action violates App Store guidelines.
Prompt engineering for mobile UI
DALL-E 3 quality depends heavily on the prompt. For user applications:
System prompt wrapper — before sending to API, enhance user text with standard instructions:
let enhancedPrompt = """
\(userPrompt)
Style: high quality, detailed, professional photography or illustration.
Avoid text, watermarks, blurry elements.
"""
DALL-E 3 automatically rewrites the prompt (revised_prompt), but setting style helps avoid random variations.
Content policy. DALL-E 3 rejects prompts with violence, nudity, named real people. Response — 400 with code: content_policy_violation. On client — show users a clear message, not technical error code.
Variations and editing
DALL-E 2 (not 3) supports:
-
/v1/images/variations— variations of an existing image -
/v1/images/edits— editing with mask (inpainting)
DALL-E 3 doesn't have these endpoints. If editing is needed — either DALL-E 2, or Stable Diffusion via Replicate/FAL.
Timeline
Basic UI generation (text field + button + result display) — 2–3 days. Full gallery with history, saving, sharing, variations — 8–12 days.







