Prompt Engineering Setup for AI Assistant in Mobile App
GPT-4o without proper prompt configuration answers one question verbosely, changes tone on the next, and returns JSON instead of text on the third. Prompt engineering isn't "write good instructions" — it's managing model behavior determinism via system prompts, few examples, and context window control.
System Prompt: Structure Matters
Bad system prompt: "You're a helpful assistant for our app. Answer briefly and relevantly."
Working system prompt contains four zones:
Role and domain constraints. "You are a personal finance app assistant. Answer only questions about budgeting, expense categorization, and financial planning. For off-topic questions say: 'I only help with personal finance questions.'"
Output format. If assistant returns structured data — describe schema directly in system prompt with example. Model adheres to format much more reliably seeing concrete sample.
Tone and style. "Answer briefly — max 3 sentences. No bullet lists in conversational responses. Don't start with 'Of course!' or 'Great question!'"
User context. Dynamic information injected: user name, current app section, recent actions.
// iOS — system prompt construction with context
func buildSystemPrompt(user: User, currentScreen: AppScreen) -> String {
return """
You are the financial assistant for MoneyMap app.
User: \(user.name), currency: \(user.currency).
Current section: \(currentScreen.description).
Monthly budget: \(user.monthlyBudget). Spent: \(user.spent).
Answer briefly in English without lists.
"""
}
Few-shot Examples and Context Window Management
Few-shot — 2–5 "question → correct answer" pairs at dialogue start. Work as behavior templates. Critical: examples must cover edge cases, not just "ideal" scenarios.
Mobile assistant problem — context window limitation in long sessions. gpt-4o-mini has 128K tokens but costs grow linearly. History management strategies:
- Sliding window: keep only last N messages (usually 10–20). Cheap, but assistant "forgets" conversation start
- Summary compression: periodically compress history: "User discussed expense categorization, added 3 transactions" — replaces 10 messages
- Retrieval-augmented memory: important facts saved to vector store, retrieved by relevance. Complex, scales better
Temperature, top_p and When to Adjust
temperature=0 — deterministic output, model always picks highest probability token. For structured answers (JSON, numbers, classification) — set 0 or 0.1. For text generation "in style" — 0.7–0.9.
top_p=0.9 + temperature=0.7 — standard conversational assistant combo. Don't adjust both simultaneously — unpredictable interaction.
Timeline Estimates
System prompt design and testing — 2–4 days. Context window management implementation — 1–2 days. Total: 3–5 working days for basics. Iterative improvement post-launch — continuous process based on user feedback.







