Mobile App Text-to-Speech Implementation

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.
Development and support of all types of mobile applications:
Information and entertainment mobile applications
News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators
E-commerce mobile applications
Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.
Business process management mobile applications
CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems
Electronic services mobile applications
Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Showing 1 of 1 servicesAll 1735 services
Mobile App Text-to-Speech Implementation
Simple
~2-3 business days
FAQ
Our competencies:
Development stages
Latest works
  • image_mobile-applications_feedme_467_0.webp
    Development of a mobile application for FEEDME
    756
  • image_mobile-applications_xoomer_471_0.webp
    Development of a mobile application for XOOMER
    624
  • image_mobile-applications_rhl_428_0.webp
    Development of a mobile application for RHL
    1052
  • image_mobile-applications_zippy_411_0.webp
    Development of a mobile application for ZIPPY
    947
  • image_mobile-applications_affhome_429_0.webp
    Development of a mobile application for Affhome
    862
  • image_mobile-applications_flavors_409_0.webp
    Development of a mobile application for the FLAVORS company
    445

Text-to-Speech Implementation in Mobile Applications

Text-to-Speech is one of the few mobile AI features where native APIs provide acceptable quality out-of-the-box without external dependencies. iOS AVSpeechSynthesizer and Android TextToSpeech work on-device, support Russian and don't require internet. The main work is proper integration, queue management, and voice selection.

AVSpeechSynthesizer on iOS

The basic case is three lines of code. Real production is more complex.

let synthesizer = AVSpeechSynthesizer()
let utterance = AVSpeechUtterance(string: text)
utterance.voice = AVSpeechSynthesisVoice(language: "ru-RU")
utterance.rate = 0.5 // 0.0–1.0, default = 0.5
synthesizer.speak(utterance)

iOS voices come in "compact" (built-in, ~50 MB) and "enhanced" (higher quality, ~300 MB download). Enhanced voices use neural synthesis. If the device hasn't downloaded them — AVSpeechSynthesisVoice(identifier: "com.apple.voice.enhanced.ru-RU.Milena") returns nil. Check and fallback to compact.

let enhanced = AVSpeechSynthesisVoice(identifier: "com.apple.voice.enhanced.ru-RU.Milena")
utterance.voice = enhanced ?? AVSpeechSynthesisVoice(language: "ru-RU")

Managing AVAudioSession is mandatory. TTS must work even if the app switched the session for microphone recording or music playback. Use .playback category with mixWithOthers or .duckOthers depending on requirements.

Android TextToSpeech: Initialization and Queue Management

TextToSpeech requires asynchronous initialization — common mistake: calling speak() before onInit(status) returns SUCCESS.

val tts = TextToSpeech(context) { status ->
    if (status == TextToSpeech.SUCCESS) {
        tts.language = Locale("ru", "RU")
        // only now can you call speak()
    }
}

QUEUE_FLUSH — interrupts the current utterance and starts a new one. QUEUE_ADD — adds to queue. For sequential notifications (e.g., navigation turn-by-turn), use QUEUE_ADD. For assistant responses, use QUEUE_FLUSH to prevent queue buildup on rapid input.

UtteranceProgressListener — tracks utterance start and end:

tts.setOnUtteranceProgressListener(object : UtteranceProgressListener() {
    override fun onStart(utteranceId: String) { /* show indicator */ }
    override fun onDone(utteranceId: String) { /* hide indicator */ }
    override fun onError(utteranceId: String) { /* handle error */ }
})

Each speak() call must receive a unique utteranceId — otherwise callbacks won't trigger properly.

Managing Speed and Pauses

SSML (Speech Synthesis Markup Language) is supported on iOS from version 14.0:

let ssml = "<speak><prosody rate='slow'>Attention</prosody>, <break time='500ms'/>next stop.</speak>"
let utterance = AVSpeechUtterance(ssmlRepresentation: ssml)

On Android, SSML support depends on the engine (Google TTS supports it, Samsung TTS partially). For critical cases, split text into multiple speak() calls with pauses via playSilentUtterance.

Speed adjustment for accessibility: provide users with rate control in app settings. Older users often prefer 0.35–0.4 instead of default 0.5.

Timeline

Basic TTS integration with queue management and voice handling — 2–3 working days. Cost is calculated individually.