AI Support Assistant Integration
An AI support assistant is a step above a regular chatbot. It knows your documentation, knowledge base, and specific user statuses. When a user asks "why doesn't export work", the assistant searches your Help Center, checks the user's plan, and provides a specific answer rather than a template.
RAG Architecture for Support
RAG (Retrieval-Augmented Generation) — the model doesn't know your product, but on each request it's fed relevant documentation chunks from a vector database:
User question
↓
Vectorize question (embedding)
↓
Search similar chunks in Vector DB
↓
LLM receives: question + documentation context
↓
Answer based on your content
Stack for implementation:
| Component | Options |
|---|---|
| Vector DB | Pinecone, Weaviate, Qdrant, pgvector |
| Embeddings | OpenAI text-embedding-3-small, Cohere, Ollama (self-hosted) |
| LLM | GPT-4o-mini, Claude 3.5 Haiku |
| Orchestration | LangChain.js, LlamaIndex, or framework-free |
Knowledge Base Indexing
import OpenAI from 'openai';
import { QdrantClient } from '@qdrant/js-client-rest';
const openai = new OpenAI();
const qdrant = new QdrantClient({ url: 'http://localhost:6333' });
// Prepare collection
await qdrant.createCollection('support-docs', {
vectors: { size: 1536, distance: 'Cosine' },
});
// Index articles
async function indexArticle(article) {
// Split into chunks of 500 tokens with 100 overlap
const chunks = splitIntoChunks(article.content, { size: 500, overlap: 100 });
const embeddings = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: chunks.map(c => c.text),
});
const points = chunks.map((chunk, i) => ({
id: generateId(),
vector: embeddings.data[i].embedding,
payload: {
text: chunk.text,
articleId: article.id,
articleTitle: article.title,
category: article.category,
url: article.url,
},
}));
await qdrant.upsert('support-docs', { points });
}
Search and Answer Generation
async function answerQuestion(userId, question) {
// User context from DB
const user = await db.users.findById(userId);
const userContext = `
User: ${user.name}
Plan: ${user.plan}
Registration date: ${user.createdAt}
Last 3 tickets: ${user.recentTickets.join(', ')}
`;
// Vector search
const queryEmbedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: question,
});
const results = await qdrant.search('support-docs', {
vector: queryEmbedding.data[0].embedding,
limit: 4,
score_threshold: 0.75, // Ignore irrelevant
});
const context = results.map(r =>
`[${r.payload.articleTitle}](${r.payload.url})\n${r.payload.text}`
).join('\n\n---\n\n');
// Generate answer
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
stream: true,
messages: [
{
role: 'system',
content: `You are a technical support assistant.
Answer ONLY based on the provided documentation.
If the answer is not in the documentation, state this explicitly and offer to create a ticket.
Always cite the source (link to the article).
User context:
${userContext}`,
},
{
role: 'user',
content: `Question: ${question}\n\nRelevant documentation:\n${context}`,
},
],
max_tokens: 600,
temperature: 0.2,
});
return {
stream: response,
sources: results.map(r => ({ title: r.payload.articleTitle, url: r.payload.url })),
};
}
Handoff to Live Agent
When the bot can't help — escalate to an agent:
const ESCALATION_TRIGGERS = [
'want to talk to a person',
'operator',
'complaint',
'refund',
'delete account',
];
function shouldEscalate(message, confidenceScore) {
const lowerMessage = message.toLowerCase();
const hasKeyword = ESCALATION_TRIGGERS.some(t => lowerMessage.includes(t));
const lowConfidence = confidenceScore < 0.6;
return hasKeyword || lowConfidence;
}
async function handleMessage(userId, message) {
const { answer, confidence, sources } = await answerQuestion(userId, message);
if (shouldEscalate(message, confidence)) {
await createSupportTicket(userId, message);
return {
type: 'escalation',
message: 'Forwarding your request to an agent. Average response time is 2 hours.',
ticketId: ticket.id,
};
}
await logConversation(userId, message, answer);
return { type: 'answer', content: answer, sources };
}
Knowledge Base Updates
When documentation is updated — re-index it:
// Webhook from CMS on article update
app.post('/webhooks/docs-updated', async (req, res) => {
const { articleId, action } = req.body;
if (action === 'delete') {
await qdrant.delete('support-docs', {
filter: { must: [{ key: 'articleId', match: { value: articleId } }] },
});
} else {
const article = await fetchArticle(articleId);
// Delete old chunks
await qdrant.delete('support-docs', {
filter: { must: [{ key: 'articleId', match: { value: articleId } }] },
});
// Re-index
await indexArticle(article);
}
res.json({ ok: true });
});
Analytics and Improvements
Log all conversations and collect feedback:
// After each response, offer rating
function renderFeedback(messageId) {
return (
<div className="feedback">
<button onClick={() => submitFeedback(messageId, 'helpful')}>Helpful</button>
<button onClick={() => submitFeedback(messageId, 'not-helpful')}>Not helpful</button>
</div>
);
}
// Weekly report: top-20 unanswered questions
SELECT question, COUNT(*) as count
FROM support_conversations
WHERE feedback = 'not-helpful'
GROUP BY question
ORDER BY count DESC
LIMIT 20;
Timeline
- RAG assistant with knowledge base (up to 500 articles) — 5–7 days
- Personalization by user context — plus 1–2 days
- Handoff to agent + ticketing system — plus 2–3 days
- Conversation analytics and dashboard — plus 3–4 days







