Implementing AI Streaming Response in a Mobile Application
Without streaming, an AI assistant is unacceptable to users. Waiting 5–10 seconds with blank screen before response appears—this is not "slow," it's "broken." Streaming via Server-Sent Events or WebSocket delivers first token in 300–600 ms, and user sees the model "thinking." Technically simple—the complexity is in proper stream handling on mobile without rendering artifacts.
iOS: AsyncBytes and SSE Parsing
Most LLM APIs return streaming via SSE (Server-Sent Events)—text protocol over HTTP. Each event: line data: {json}, blank line—separator.
On iOS, native way—URLSession + AsyncBytes, available since iOS 15:
func streamCompletion(request: URLRequest) -> AsyncThrowingStream<String, Error> {
AsyncThrowingStream { continuation in
Task {
let (bytes, response) = try await URLSession.shared.bytes(for: request)
guard (response as? HTTPURLResponse)?.statusCode == 200 else {
continuation.finish(throwing: APIError.badStatus)
return
}
for try await line in bytes.lines {
guard line.hasPrefix("data: ") else { continue }
let payload = String(line.dropFirst(6))
guard payload != "[DONE]" else {
continuation.finish()
return
}
if let data = payload.data(using: .utf8),
let chunk = try? JSONDecoder().decode(StreamChunk.self, from: data),
let delta = chunk.choices.first?.delta.content {
continuation.yield(delta)
}
}
}
}
}
Usage in ViewModel:
func sendMessage(_ text: String) {
Task { @MainActor in
currentResponse = ""
for try await token in streamCompletion(request: buildRequest(text)) {
currentResponse += token
}
}
}
@MainActor ensures UI update on main thread without explicit DispatchQueue.main.async.
Android: OkHttp + EventSource
On Android, no native SSE client. OkHttp—standard choice:
class SSEClient(private val client: OkHttpClient) {
fun stream(request: Request): Flow<String> = callbackFlow {
val call = client.newCall(request)
call.enqueue(object : Callback {
override fun onResponse(call: Call, response: Response) {
response.body?.source()?.let { source ->
while (!source.exhausted()) {
val line = source.readUtf8Line() ?: break
if (line.startsWith("data: ")) {
val payload = line.removePrefix("data: ")
if (payload == "[DONE]") {
close()
return
}
// parse JSON, extract delta
trySend(extractDelta(payload))
}
}
}
close()
}
override fun onFailure(call: Call, e: IOException) = close(e)
})
awaitClose { call.cancel() }
}
}
callbackFlow—correct way to turn callback-based OkHttp into Kotlin Flow. trySend instead of send—doesn't block thread.
For Flutter: http package doesn't support SSE. Use dio with ResponseType.stream or dart:io HttpClient directly.
Text Rendering During Streaming
Here's where most mistakes happen: if response has Markdown (bold, code, lists), render carefully. Problem: Markdown parser sees incomplete constructs—e.g., **bold without closing **—and renders artifacts.
Two approaches:
- Render only complete blocks—buffer accumulates until closing token, then renders. Clean result, adds latency.
- Render as plain text during streaming, Markdown after completion—simpler and more reliable for most assistants.
On iOS—AttributedString with NSMarkdownParser for final render, Text(currentResponse) during streaming. On Android—Markwon library for final render in TextView.
Request Cancellation
User clicked "Stop"—need to properly cancel streaming request. On iOS: Task.cancel() automatically cancels URLSession.bytes—for await throws CancellationError. On Android: call.cancel() via OkHttp, flow.cancellation().
Don't forget: after cancellation, save already received partial response to dialog history—user saw the text, and it should remain.
Handling Connection Breaks
Mobile network breaks. Streaming request interrupts mid-response. Correct reaction: show what was already received, offer "Continue." You can't save lastTokenIndex or last stop_reason—API doesn't support resuming from middle. Need to generate anew, passing already-received part in context.
Timeline Estimates
Streaming client with proper rendering, cancellation, and error handling—4–6 days for one platform, 1–1.5 weeks for both.







