ChatGPT API Integration in Mobile Applications
ChatGPT API integration in mobile applications is not just URLSession.dataTask with JSON body. It's managing streaming output, conversation context, key security, and costs. Each aspect has its own nuances on mobile.
API Key: Never in Client Code
First and foremost: OpenAI API key must never make it into the app bundle, source code, or even encrypted settings on the device. If the key is in the client — it's compromised.
Correct architecture: mobile app → your backend-proxy → OpenAI API. Backend authorizes users, applies rate limiting, logs costs, supplies the key. Additionally: backend can cache typical responses, reducing costs.
On backend: if you don't want to write a proxy from scratch, use openai-node or openai-python SDK behind nginx. Or serverless via Cloudflare Workers — cold start ~5 ms, cheaper than EC2 at low traffic.
Streaming Output
Without streaming, users wait for the full response — 3–8 seconds for long texts. With streaming — first token appears in 200–400 ms, text grows as it generates.
OpenAI Chat Completions API with stream: true returns Server-Sent Events (SSE). On mobile, parse SSE manually — URLSession doesn't support SSE out of the box.
On iOS — URLSessionDataDelegate with urlSession(_:dataTask:didReceive:):
func urlSession(_ session: URLSession, dataTask: URLSessionDataTask, didReceive data: Data) {
let lines = String(data: data, encoding: .utf8)?.components(separatedBy: "\n") ?? []
for line in lines where line.hasPrefix("data: ") {
let jsonString = String(line.dropFirst(6))
guard jsonString != "[DONE]" else { return }
// parse delta.content from JSON
}
}
On Android — OkHttp with EventSourceListener from okhttp-sse:
val eventSource = EventSources.createFactory(client)
.newEventSource(request, object : EventSourceListener() {
override fun onEvent(source: EventSource, id: String?, type: String?, data: String) {
if (data == "[DONE]") return
// parse delta.content
}
})
Update UI on each token: @Published var streamingText: String (iOS) or StateFlow<String> (Android). Don't call recompose / setState too often — buffer tokens and update UI every 50–100 ms.
Conversation Context Management
ChatGPT API is stateless — each request is independent. You build conversation context: pass a messages array with history.
Limitation: gpt-4o-mini has 128k token context. In practice, long context means high cost. Strategies:
- Sliding window — last N messages, discard the rest.
-
Summarization — when exceeding threshold (e.g., 8000 tokens), compress old history via separate request with
"Summarize this conversation in 3 sentences". - Selective memory — save only high-importance messages (user explicitly stated a personal fact).
Cost Tracking
Each request costs money. On mobile it's important to:
- Not send request on every keystroke (debounce 500 ms)
- Limit
max_tokensin response for the task — not 4096 where 256 suffices - Log
usage.total_tokensfrom each response to analytics (Firebase or own backend) - Set limits via OpenAI Usage Limits dashboard (hard cap per month)
Case study: language learning app with AI tutor. gpt-4o-mini, streaming. Context — last 10 messages + system prompt with lesson rules (~300 tokens). Average request: 450 input + 180 output tokens. At 500 DAU with 15 messages per session — ~3.4M tokens daily. At 2025 prices — acceptable. Caching system prompt via OpenAI Prompt Caching reduced input cost by 35%.
Error Handling
429 Too Many Requests — exponential backoff: 1s, 2s, 4s, 8s. Maximum 3 retries. 503 Service Unavailable — similarly. 400 Bad Request — usually messages format issue (empty content, invalid role). All errors go to Crashlytics / Sentry with full request context (without token).
Timeline
Integration with streaming output, context management, and backend-proxy — 3–5 working days. Cost is calculated individually.







