Real-Time AI Text Translation in Mobile Applications
Real-time translation means different things depending on scenario. For chat — send message and get translation in 200–400 ms, invisible to user. For text field with live translation — respond to every N characters without spam. For document — batch with progress bar. Each scenario needs different architecture.
Three Stack Variants and When to Choose
Google Translate API (Cloud Translation v3) — production standard. Supports 130+ languages, neural translation (NMT), formal/informal registers for some. REST or gRPC. For mobile — REST through HTTPS, key hidden behind own backend.
DeepL API — best quality for European languages, especially German, French, Polish. For Russian — comparable to Google. Free plan limited to 500K characters/month.
On-device: ML Kit Translate — Google ML Kit with local models. ~15 MB per language pair, downloads once. Full offline, zero network latency, quality lower than cloud. Supports 58 languages. Good for messengers with offline requirements.
Debounce and Request Management
Main mistake in live translation — sending request on every keypress. At 200 char/min typing rate, that's 3–4 requests per second, 90% wasted before response.
Right pattern on iOS (Combine):
@Published var inputText: String = ""
inputText
.publisher
.debounce(for: .milliseconds(500), scheduler: RunLoop.main)
.removeDuplicates()
.filter { $0.count >= 3 }
.flatMap(maxPublishers: .max(1)) { [weak self] text -> AnyPublisher<String, Never> in
guard let self else { return Empty().eraseToAnyPublisher() }
return self.translationService.translate(text)
.replaceError(with: "")
.eraseToAnyPublisher()
}
.receive(on: RunLoop.main)
.assign(to: &$translatedText)
.flatMap(maxPublishers: .max(1)) — key moment. This is switchMap: on new input, cancels previous pending request. Without it, old responses can arrive after newer ones and overwrite actual translation.
On Android with Kotlin Flow:
val translatedText: StateFlow<String> = inputText
.debounce(500)
.filter { it.length >= 3 }
.distinctUntilChanged()
.flatMapLatest { text ->
flow { emit(translationRepo.translate(text)) }
.catch { emit("") }
}
.stateIn(viewModelScope, SharingStarted.Lazily, "")
flatMapLatest — equivalent to switchMap, cancels previous coroutine.
Google Cloud Translation v3 Integration
suspend fun translate(text: String, targetLang: String = "ru"): String {
val body = JSONObject().apply {
put("q", text)
put("target", targetLang)
put("format", "text")
}
val response = httpClient.post("https://translation.googleapis.com/language/translate/v2") {
header("Authorization", "Bearer $accessToken")
contentType(ContentType.Application.Json)
setBody(body.toString())
}
return response.body<TranslationResponse>().data.translations[0].translatedText
}
For access_token in production — GCP service account, JWT signing on backend. Mobile client gets short-lived token via own /api/translate-token endpoint. GCP API key in APK/IPA — no.
ML Kit for Offline Scenarios
// iOS: Google ML Kit Translate
let options = TranslatorOptions(sourceLanguage: .english, targetLanguage: .russian)
let translator = Translator.translator(options: options)
translator.downloadModelIfNeeded { error in
guard error == nil else { return }
translator.translate("Hello world") { result, error in
print(result ?? "")
}
}
Model downloads once via Wi-Fi (preferred). Then — offline work. Latency on device — 20–50 ms per phrase. Perfect for messengers, offline video subtitles, travel apps.
Translation Caching
Repeated requests of same text — money wasted. Cache at SQLite level (Room/CoreData) with key sha256(source_text + target_lang). 7-day TTL for regular content, no TTL for static UI strings.
At HTTP level — Cache-Control for GET requests. Google Translate supports GET with parameters, allowing cache at URLCache / OkHttp Cache level.
Practical Example
In medical tourism app (iOS + Android) translated clinic descriptions and patient reviews (en→ru, ru→en, de→ru). Cloud translation for content on load, ML Kit offline for consultation UI. 700 ms debounce for search. Translation cache in Room reduced API requests by 73% first week.
Timeline
Basic integration of one provider with debounce — 3–5 days. Adding ML Kit offline, cache, auto-language detection — another 4–6 days. Handling formatted text (HTML, Markdown) — separate task.







