AI Streaming Response Implementation for Mobile App

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.
Development and support of all types of mobile applications:
Information and entertainment mobile applications
News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators
E-commerce mobile applications
Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.
Business process management mobile applications
CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems
Electronic services mobile applications
Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Showing 1 of 1 servicesAll 1735 services
AI Streaming Response Implementation for Mobile App
Medium
~2-3 business days
FAQ
Our competencies:
Development stages
Latest works
  • image_mobile-applications_feedme_467_0.webp
    Development of a mobile application for FEEDME
    756
  • image_mobile-applications_xoomer_471_0.webp
    Development of a mobile application for XOOMER
    624
  • image_mobile-applications_rhl_428_0.webp
    Development of a mobile application for RHL
    1052
  • image_mobile-applications_zippy_411_0.webp
    Development of a mobile application for ZIPPY
    947
  • image_mobile-applications_affhome_429_0.webp
    Development of a mobile application for Affhome
    862
  • image_mobile-applications_flavors_409_0.webp
    Development of a mobile application for the FLAVORS company
    445

Implementing AI Streaming Response in a Mobile Application

Without streaming, an AI assistant is unacceptable to users. Waiting 5–10 seconds with blank screen before response appears—this is not "slow," it's "broken." Streaming via Server-Sent Events or WebSocket delivers first token in 300–600 ms, and user sees the model "thinking." Technically simple—the complexity is in proper stream handling on mobile without rendering artifacts.

iOS: AsyncBytes and SSE Parsing

Most LLM APIs return streaming via SSE (Server-Sent Events)—text protocol over HTTP. Each event: line data: {json}, blank line—separator.

On iOS, native way—URLSession + AsyncBytes, available since iOS 15:

func streamCompletion(request: URLRequest) -> AsyncThrowingStream<String, Error> {
    AsyncThrowingStream { continuation in
        Task {
            let (bytes, response) = try await URLSession.shared.bytes(for: request)
            guard (response as? HTTPURLResponse)?.statusCode == 200 else {
                continuation.finish(throwing: APIError.badStatus)
                return
            }
            for try await line in bytes.lines {
                guard line.hasPrefix("data: ") else { continue }
                let payload = String(line.dropFirst(6))
                guard payload != "[DONE]" else {
                    continuation.finish()
                    return
                }
                if let data = payload.data(using: .utf8),
                   let chunk = try? JSONDecoder().decode(StreamChunk.self, from: data),
                   let delta = chunk.choices.first?.delta.content {
                    continuation.yield(delta)
                }
            }
        }
    }
}

Usage in ViewModel:

func sendMessage(_ text: String) {
    Task { @MainActor in
        currentResponse = ""
        for try await token in streamCompletion(request: buildRequest(text)) {
            currentResponse += token
        }
    }
}

@MainActor ensures UI update on main thread without explicit DispatchQueue.main.async.

Android: OkHttp + EventSource

On Android, no native SSE client. OkHttp—standard choice:

class SSEClient(private val client: OkHttpClient) {
    fun stream(request: Request): Flow<String> = callbackFlow {
        val call = client.newCall(request)

        call.enqueue(object : Callback {
            override fun onResponse(call: Call, response: Response) {
                response.body?.source()?.let { source ->
                    while (!source.exhausted()) {
                        val line = source.readUtf8Line() ?: break
                        if (line.startsWith("data: ")) {
                            val payload = line.removePrefix("data: ")
                            if (payload == "[DONE]") {
                                close()
                                return
                            }
                            // parse JSON, extract delta
                            trySend(extractDelta(payload))
                        }
                    }
                }
                close()
            }
            override fun onFailure(call: Call, e: IOException) = close(e)
        })

        awaitClose { call.cancel() }
    }
}

callbackFlow—correct way to turn callback-based OkHttp into Kotlin Flow. trySend instead of send—doesn't block thread.

For Flutter: http package doesn't support SSE. Use dio with ResponseType.stream or dart:io HttpClient directly.

Text Rendering During Streaming

Here's where most mistakes happen: if response has Markdown (bold, code, lists), render carefully. Problem: Markdown parser sees incomplete constructs—e.g., **bold without closing **—and renders artifacts.

Two approaches:

  1. Render only complete blocks—buffer accumulates until closing token, then renders. Clean result, adds latency.
  2. Render as plain text during streaming, Markdown after completion—simpler and more reliable for most assistants.

On iOS—AttributedString with NSMarkdownParser for final render, Text(currentResponse) during streaming. On Android—Markwon library for final render in TextView.

Request Cancellation

User clicked "Stop"—need to properly cancel streaming request. On iOS: Task.cancel() automatically cancels URLSession.bytesfor await throws CancellationError. On Android: call.cancel() via OkHttp, flow.cancellation().

Don't forget: after cancellation, save already received partial response to dialog history—user saw the text, and it should remain.

Handling Connection Breaks

Mobile network breaks. Streaming request interrupts mid-response. Correct reaction: show what was already received, offer "Continue." You can't save lastTokenIndex or last stop_reason—API doesn't support resuming from middle. Need to generate anew, passing already-received part in context.

Timeline Estimates

Streaming client with proper rendering, cancellation, and error handling—4–6 days for one platform, 1–1.5 weeks for both.