Voice Search in Mobile App

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.
Development and support of all types of mobile applications:
Information and entertainment mobile applications
News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators
E-commerce mobile applications
Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.
Business process management mobile applications
CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems
Electronic services mobile applications
Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Showing 1 of 1 servicesAll 1735 services
Voice Search in Mobile App
Medium
~2-3 business days
FAQ
Our competencies:
Development stages
Latest works
  • image_mobile-applications_feedme_467_0.webp
    Development of a mobile application for FEEDME
    756
  • image_mobile-applications_xoomer_471_0.webp
    Development of a mobile application for XOOMER
    624
  • image_mobile-applications_rhl_428_0.webp
    Development of a mobile application for RHL
    1052
  • image_mobile-applications_zippy_411_0.webp
    Development of a mobile application for ZIPPY
    947
  • image_mobile-applications_affhome_429_0.webp
    Development of a mobile application for Affhome
    862
  • image_mobile-applications_flavors_409_0.webp
    Development of a mobile application for the FLAVORS company
    445

Development of Voice Search in Mobile Application

Voice search is not just microphone button. This is chain: audio capture, send for recognition, get text, form search query. If any link works with delay or error, user closes app. In practice main issues lie in correct Speech Recognition API integration, not UI.

Where implementation most often breaks

iOS: wrong work with SFSpeechRecognizer

Most common error — launch SFSpeechRecognitionTask without prior authorizationStatus check. App silent, user thinks button broken. Second problem: developers use SFSpeechURLRecognitionRequest (file variant) instead of SFSpeechAudioBufferRecognitionRequest for live input. Result — user speaks, waits, and transcription appears only after stopping recording.

Right approach: AVAudioEngine + SFSpeechAudioBufferRecognitionRequest with shouldReportPartialResults = true. This gives partial results as speech happens — exactly what user sees in system Siri.

let request = SFSpeechAudioBufferRecognitionRequest()
request.shouldReportPartialResults = true

recognitionTask = speechRecognizer.recognitionTask(with: request) { result, error in
    guard let result else { return }
    self.searchBar.text = result.bestTranscription.formattedString
    if result.isFinal {
        self.submitSearch(query: result.bestTranscription.formattedString)
    }
}

let inputNode = audioEngine.inputNode
let format = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: format) { buffer, _ in
    request.append(buffer)
}
audioEngine.prepare()
try audioEngine.start()

Android: choice between SpeechRecognizer and RecognizerIntent

RecognizerIntent launches system recognition dialog — integrates fast, but looks foreign in app UI and doesn't support EXTRA_PARTIAL_RESULTS on all devices. SpeechRecognizer gives full control, but requires careful lifecycle handling: need to call destroy() in onDestroy(), else get leak through RecognitionListener.

For inline integration without system popup:

val recognizer = SpeechRecognizer.createSpeechRecognizer(context)
val intent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH).apply {
    putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM)
    putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true)
    putExtra(RecognizerIntent.EXTRA_LANGUAGE, "ru-RU")
}

recognizer.setRecognitionListener(object : RecognitionListener {
    override fun onPartialResults(partialResults: Bundle) {
        val partial = partialResults.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION)
        searchInput.setText(partial?.firstOrNull() ?: "")
    }
    override fun onResults(results: Bundle) {
        val text = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION)?.firstOrNull()
        text?.let { submitSearch(it) }
    }
    // ... остальные коллбэки
})
recognizer.startListening(intent)

Flutter: speech_to_text vs direct Platform Channels

Package speech_to_text covers 90% of tasks. Problem arises in multilingual apps: localeId need to pass explicitly, else on Android recognition happens in system language, not app language.

How we do it

For most apps native Speech API sufficient and requires no external dependencies. If need higher accuracy or support domain-specific terminology (medicine, legal terms, technical names) — connect Google Cloud Speech-to-Text or Whisper API from OpenAI: they give ability to fine-tune model on domain-specific vocabulary via SpeechContext (Google) or system prompt (Whisper).

Important moment often missed: animation of sound level during recording. Not decorative — without visual feedback user doesn't understand if app hears them. On iOS level from AVAudioRecorder.averagePower(forChannel:), on Android — from MediaRecorder.getMaxAmplitude().

Search query after transcription goes through normalization: remove filler words, lowercase, handle transcription typos — "iPhone" and "айфон" should give same result.

Work process

Requirement analysis: languages, content type (free speech or commands), need offline support.

Implementation: request permissions, Speech API integration, partial results handling, microphone animation.

Query normalization and integration with search backend.

Testing on real devices with different accents in noisy conditions.

Timeline estimates

Basic native Speech API integration with UI — 2–3 days. If need multilingual support, offline mode via local model or integration with cloud ASR — 1–1.5 weeks.