Implementing Guardrails (Response Limits) for AI Assistant in Mobile App
Production AI assistant without guardrails is an open vulnerability. User asks off-domain question, attempts prompt injection via user content, or model drifts into unwanted territory. Guardrails aren't one filter — they're layered validation system for input and output.
Protection Layers: What and Where to Check
Input guardrails — validate user request before sending to LLM.
Topic filter: determine if question relates to app domain. Simple: embeddings + cosine similarity with approved topics. Reliable: separate fast classifier (GPT-4o-mini with basic prompt, ~200ms latency).
Prompt injection detection: if app processes user content (notes, documents) passed to LLM context, check for injections like "Ignore previous instructions...". Basic protection — pattern search. More reliable — specialized classifiers like rebuff or lakera-guard.
Output guardrails — validate model response before showing user.
Format and business rules validation when assistant returns structured data. Each response validated before rendering. Length and tone checks — some models generate unexpectedly long responses. Hard max_tokens in request + client-side length check before rendering.
Libraries and Ready Solutions
Guardrails AI (guardrails-ai Python) — declarative validation rules with automatic retry. Server-side applicable. NeMo Guardrails from NVIDIA — heavier enterprise solution, supports dialogue flows and topical rails.
For small apps, custom server middleware with rule set sufficient. Critical: implement on server, not client — guardrails must work server-side or can be bypassed via direct API call.
Timeline Estimates
Basic input/output filters — 1–2 days. Topic classifier with test coverage — 2–3 days. Full layered system with violation logging — 4–5 days.







