AI Streaming Response Implementation for Mobile App

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Development and support of all types of mobile applications:

Information and entertainment mobile applications

News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators

E-commerce mobile applications

Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.

Business process management mobile applications

CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems

Electronic services mobile applications

Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Offered services

Showing 1 of 1 servicesAll 1735 services

AI Streaming Response Implementation for Mobile App

Medium

~2-3 business days

FAQ

Our competencies:

Free consultation

Book a free consultation if you have any questions. A dedicated specialist will advise you.

Cost calculation

If you know what exactly you need to develop, or you already have a ready-made technical task.

Development stages

Latest works

Development of a mobile application for FEEDME
761
Development of a mobile application for XOOMER
649
Development of a mobile application for RHL
1071
Development of a mobile application for ZIPPY
947
Development of a mobile application for Affhome
884
Development of a mobile application for the FLAVORS company
466

Show more works

Implementing AI Streaming Response in a Mobile Application

Without streaming, an AI assistant is unacceptable to users. Waiting 5–10 seconds with blank screen before response appears—this is not "slow," it's "broken." Streaming via Server-Sent Events or WebSocket delivers first token in 300–600 ms, and user sees the model "thinking." Technically simple—the complexity is in proper stream handling on mobile without rendering artifacts.

iOS: AsyncBytes and SSE Parsing

Most LLM APIs return streaming via SSE (Server-Sent Events)—text protocol over HTTP. Each event: line data: {json}, blank line—separator.

On iOS, native way—URLSession + AsyncBytes, available since iOS 15:

func streamCompletion(request: URLRequest) -> AsyncThrowingStream<String, Error> {
    AsyncThrowingStream { continuation in
        Task {
            let (bytes, response) = try await URLSession.shared.bytes(for: request)
            guard (response as? HTTPURLResponse)?.statusCode == 200 else {
                continuation.finish(throwing: APIError.badStatus)
                return
            }
            for try await line in bytes.lines {
                guard line.hasPrefix("data: ") else { continue }
                let payload = String(line.dropFirst(6))
                guard payload != "[DONE]" else {
                    continuation.finish()
                    return
                }
                if let data = payload.data(using: .utf8),
                   let chunk = try? JSONDecoder().decode(StreamChunk.self, from: data),
                   let delta = chunk.choices.first?.delta.content {
                    continuation.yield(delta)
                }
            }
        }
    }
}

Usage in ViewModel:

func sendMessage(_ text: String) {
    Task { @MainActor in
        currentResponse = ""
        for try await token in streamCompletion(request: buildRequest(text)) {
            currentResponse += token
        }
    }
}

@MainActor ensures UI update on main thread without explicit DispatchQueue.main.async.

Android: OkHttp + EventSource

On Android, no native SSE client. OkHttp—standard choice:

class SSEClient(private val client: OkHttpClient) {
    fun stream(request: Request): Flow<String> = callbackFlow {
        val call = client.newCall(request)

        call.enqueue(object : Callback {
            override fun onResponse(call: Call, response: Response) {
                response.body?.source()?.let { source ->
                    while (!source.exhausted()) {
                        val line = source.readUtf8Line() ?: break
                        if (line.startsWith("data: ")) {
                            val payload = line.removePrefix("data: ")
                            if (payload == "[DONE]") {
                                close()
                                return
                            }
                            // parse JSON, extract delta
                            trySend(extractDelta(payload))
                        }
                    }
                }
                close()
            }
            override fun onFailure(call: Call, e: IOException) = close(e)
        })

        awaitClose { call.cancel() }
    }
}

callbackFlow—correct way to turn callback-based OkHttp into Kotlin Flow. trySend instead of send—doesn't block thread.

For Flutter: http package doesn't support SSE. Use dio with ResponseType.stream or dart:io HttpClient directly.

Text Rendering During Streaming

Here's where most mistakes happen: if response has Markdown (bold, code, lists), render carefully. Problem: Markdown parser sees incomplete constructs—e.g., **bold without closing **—and renders artifacts.

Two approaches:

Render only complete blocks—buffer accumulates until closing token, then renders. Clean result, adds latency.
Render as plain text during streaming, Markdown after completion—simpler and more reliable for most assistants.

On iOS—AttributedString with NSMarkdownParser for final render, Text(currentResponse) during streaming. On Android—Markwon library for final render in TextView.

Request Cancellation

User clicked "Stop"—need to properly cancel streaming request. On iOS: Task.cancel() automatically cancels URLSession.bytes—for await throws CancellationError. On Android: call.cancel() via OkHttp, flow.cancellation().

Don't forget: after cancellation, save already received partial response to dialog history—user saw the text, and it should remain.

Handling Connection Breaks

Mobile network breaks. Streaming request interrupts mid-response. Correct reaction: show what was already received, offer "Continue." You can't save lastTokenIndex or last stop_reason—API doesn't support resuming from middle. Need to generate anew, passing already-received part in context.

Timeline Estimates

Streaming client with proper rendering, cancellation, and error handling—4–6 days for one platform, 1–1.5 weeks for both.