RAG (Retrieval-Augmented Generation) for AI Bot in Mobile App

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Development and support of all types of mobile applications:

Information and entertainment mobile applications

News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators

E-commerce mobile applications

Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.

Business process management mobile applications

CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems

Electronic services mobile applications

Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Offered services

Showing 1 of 1 servicesAll 1735 services

RAG (Retrieval-Augmented Generation) for AI Bot in Mobile App

Complex

~1-2 weeks

FAQ

Our competencies:

Free consultation

Book a free consultation if you have any questions. A dedicated specialist will advise you.

Cost calculation

If you know what exactly you need to develop, or you already have a ready-made technical task.

Development stages

Latest works

Development of a mobile application for FEEDME
756
Development of a mobile application for XOOMER
624
Development of a mobile application for RHL
1052
Development of a mobile application for ZIPPY
947
Development of a mobile application for Affhome
862
Development of a mobile application for the FLAVORS company
445

Show more works

Implementing RAG (Retrieval-Augmented Generation) for AI Bot in a Mobile Application

RAG solves specific problem: model doesn't know your product, your documentation, your internal regulations. Fine-tuning—expensive and slowly updates. RAG—cheaper, more current, more transparent. User asks question → system searches relevant documentation fragments → passes them to model context → model answers based on real data.

RAG System Components and Where They Live

RAG is not one function but pipeline of stages:

Ingestion (loading and indexing):

Split documents into chunks (chunking)
Create embeddings for each chunk
Save in vector DB

Retrieval (search):

Embedding of user query
Vector search (cosine similarity / ANN)
Reranking results (optional)

Generation (generation):

Form prompt with context
Call LLM
Postprocess answer

On mobile, entire Ingestion and most Retrieval—server task. Client makes API request, gets answer with sources.

Chunking: Most Underestimated Stage

RAG quality determined by chunk quality. Bad chunking kills accuracy regardless of model.

Fixed chunking (by 500 characters)—don't. Breaks sentences, loses paragraph context.

Semantic chunking—split by semantic boundaries (headers, paragraphs, sentences). Works by default for Markdown and HTML. Library LangChain4j on Java/Kotlin provides RecursiveCharacterTextSplitter with delimiters ["\n\n", "\n", ". "]—correct approach.

Overlap—10–20% between chunks: last 50–100 tokens of previous chunk included in next's start. Preserves context on boundaries.

Optimal chunk size depends on document type: technical docs—300–500 tokens, legal texts—500–800 tokens, FAQ—one chunk = one Q&A.

Embeddings: Model Choice

Model	Dimension	Context	Cost	Best For
text-embedding-3-small	1536	8192	Cheap	General content
text-embedding-3-large	3072	8192	Medium	Technical docs
nomic-embed-text	768	8192	Free (self-host)	Private data
multilingual-e5-large	1024	512	Free (self-host)	Multilingual

For mobile app with sensitive data—self-hosted model. OpenAI Embeddings sends docs to OpenAI servers.

Retrieval: What Really Affects Quality

Hybrid search—combining vector search and BM25 (keyword search) gives better results than vector alone. Pgvector + pg_trgm allow doing this in PostgreSQL without separate infrastructure.

Reranking—after vectorsearch take top-20 results, run through cross-encoder model (cross-encoder/ms-marco-MiniLM-L-6-v2), return top-5. Significantly improves relevance. Cohere Rerank API—if don't want self-hosted model.

Metadata filtering—if documents have metadata (date, section, language, document type), filter by them before vector search. Searching vectors among 10 thousand relevant chunks instead million—faster and more accurate.

Prompt Formation with Context

System: You are product assistant. Answer ONLY based on provided context.
If answer not in context—say so directly.

Context:
[Chunk 1]: <text>
[Chunk 2]: <text>
[Chunk 3]: <text>

User: How to set up two-factor authentication?

Indicating sources—good practice. On mobile show list of chunks/documents under answer: user can verify where info comes from. Reduces hallucination risk and increases trust.

Mobile UI for RAG Bot

Answer rendering specifics:

Stream via SSE—answer appears gradually
Sources under answer (collapsible list)
"Searching knowledge base" indicator during Retrieval (100–300 ms)
"Didn't find answer" button for escalation to operator

Flutter: flutter_markdown for answer rendering, custom widget for sources. iOS: UILabel with NSAttributedString or UITextView + WKWebView for Markdown. Android: Markwon—best Markdown renderer for RecyclerView.

Stages and Timeline

Audit document corpus → design indexing schema → choose vector DB → implement ingestion pipeline → configure hybrid search + reranking → integrate with LLM → mobile chat UI with sources → evaluate quality (RAGAS or manual) → iterate on prompts and chunking.

Basic RAG bot with simple documentation—3–5 weeks. Production system with hybrid search, reranking, multilinguality, quality evaluation—8–12 weeks.