Mobile App Voice Assistant Implementation

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Development and support of all types of mobile applications:

Information and entertainment mobile applications

News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators

E-commerce mobile applications

Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.

Business process management mobile applications

CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems

Electronic services mobile applications

Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Offered services

Showing 1 of 1 servicesAll 1735 services

Mobile App Voice Assistant Implementation

Complex

~1-2 weeks

FAQ

Our competencies:

Free consultation

Book a free consultation if you have any questions. A dedicated specialist will advise you.

Cost calculation

If you know what exactly you need to develop, or you already have a ready-made technical task.

Development stages

Latest works

Development of a mobile application for FEEDME
756
Development of a mobile application for XOOMER
624
Development of a mobile application for RHL
1052
Development of a mobile application for ZIPPY
947
Development of a mobile application for Affhome
862
Development of a mobile application for the FLAVORS company
445

Show more works

Voice Assistant Implementation in Mobile Applications

A voice assistant in a mobile app isn't just a microphone button. It's a pipeline of multiple components: VAD (Voice Activity Detection), STT, NLU/Intent Recognition, business logic processing, TTS. Each component adds latency. The goal is total latency from end of user speech to start of response ≤1.5 seconds. This is a technical constraint, not a marketing target.

Pipeline Architecture

Microphone → VAD → STT → NLU → Logic → TTS → Speaker
               ↕         ↕
           Streaming   Intent DB

VAD — voice activity detection prevents sending silence to STT. WebRTCVAD (native library) or SileroVAD (ONNX/TFLite, ~1 MB). VAD reduces false positives and saves API calls.

STT — speech-to-text conversion. Options: native SFSpeechRecognizer / Android STT for simple cases; OpenAI Whisper API or Yandex SpeechKit for Russian with high accuracy.

NLU — intent and entity extraction from text. Example: "add milk to shopping list" → intent: ADD_TO_LIST, entity: {item: milk, list: shopping}. Solutions:

Rasa NLU — open source, self-hosted, trains on your data. Suitable for complex domains with many intents.
Dialogflow ES/CX — Google cloud NLU, quick to start, good Russian language support. Paid at high volume.
LLM-based classification — ChatGPT / Claude API with structured output (function calling). Flexible, no training data annotation needed, more expensive at high traffic.
On-device BERT — MobileBERT TFLite, ~50 MB, classifies intents from fixed set. Works offline.

Intent Recognition: What Actually Works

For applications with limited domain (smart home, online banking, navigation) — Rasa NLU or Dialogflow with explicit intents. 50–200 training examples per intent is sufficient.

For open domain — LLM with system prompt describing available actions. LLM returns JSON via function calling:

{
  "intent": "navigate_to",
  "destination": "Pushkin restaurant",
  "time": null
}

LLM request latency: 400–800 ms for gpt-4o-mini, 200–400 ms for Claude Haiku. Add to STT (800–1500 ms cloud) and TTS (~300 ms). Total: 1.3–2.8 seconds. On the edge of comfortable.

Optimization: start LLM request in parallel during last 200 ms of STT (before final result), cache frequent intents locally.

Conversation Context

A voice assistant without context memory breaks on the second question: "Who is Gazprom's CEO?" — answer. "What about his wife?" — without context, unclear whose wife. Context is an array of last N messages, passed to each LLM request or Dialogflow session.

Mobile context management: ConversationStore — singleton with @Published message list. Maximum 10–15 recent messages (~2000 tokens context is sufficient for most dialogues).

Wake Word (Optional)

"Hey, [AppName]" without button press — works via PorcupineManager from Picovoice. On-device, custom wake word, ~500 KB model. Battery consumption — ~1.5% per hour on modern devices. On iOS requires background audio session, which Apple checks during review.

Case Study

Corporate assistant for field employees: voice task and CRM request creation without unlocking phone. Stack: SileroVAD on-device → Yandex SpeechKit streaming → Rasa NLU (self-hosted, 23 intents) → CRM REST API → Yandex SpeechKit TTS. Latency from end of speech to start of response: median 1.1 seconds, p95 2.3 seconds. Rasa NLU on own server provided full data control.

Timeline

Pipeline with STT + NLU with fixed intent set + TTS — 2–3 weeks. With wake word, conversation context, and business logic integration — 4–6 weeks. Cost is calculated individually.