RAG AI Bot Implementation (Retrieval-Augmented Generation)
RAG combines external knowledge retrieval with language models. Bot retrieves relevant documents/data, then generates answers based on that context. Better accuracy than pure LLM for domain-specific Q&A.
RAG Architecture
User Query → Vector Search → Retrieved Context → LLM → Answer
Implementation
import OpenAI from 'openai';
import { QdrantClient } from '@qdrant/js-client-rest';
const openai = new OpenAI();
const qdrant = new QdrantClient({ url: 'http://localhost:6333' });
async function ragQuery(userQuestion) {
// 1. Embed question
const questionEmbedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: userQuestion,
});
// 2. Retrieve context
const searchResults = await qdrant.search('knowledge-base', {
vector: questionEmbedding.data[0].embedding,
limit: 5,
score_threshold: 0.7,
});
const context = searchResults.points
.map(p => p.payload.text)
.join('\n\n');
// 3. Generate answer
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{
role: 'system',
content: `Answer based on provided context. If answer not in context, say so.
Context:
${context}`,
},
{ role: 'user', content: userQuestion },
],
max_tokens: 500,
});
return {
answer: response.choices[0].message.content,
sources: searchResults.points.map(p => p.payload.source),
};
}
Knowledge Base Setup
// Index documents
async function indexDocument(doc) {
const chunks = chunkText(doc.content, { size: 500, overlap: 100 });
const embeddings = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: chunks,
});
const points = chunks.map((text, i) => ({
id: generateId(),
vector: embeddings.data[i].embedding,
payload: {
text,
source: doc.source,
docId: doc.id,
},
}));
await qdrant.upsert('knowledge-base', { points });
}
Timeline
- Setup Qdrant + embeddings — 1–2 days
- Index knowledge base — 1 day
- RAG implementation — 2 days
- UI + streaming — 2–3 days
- Quality assurance — 2–3 days







