AI-Powered Expense Analysis and Transaction Categorization
Manual transaction categorization is what users do the first week, then abandon. Automatic categorization via rules ("if merchant name contains 'LENTA'—it's groceries") works for major retailers but breaks on "LLC PERSPECTIVE" or "IP Ivanov A.V.". ML categorizer + LLM analysis on top delivers different quality.
Categorization: Model vs LLM
ML classifier (TF-IDF + LightGBM or distilBERT). Trained on historical categorized transactions. Inference < 10ms, works offline, cost zero post-training. Accuracy on top-100 merchants—95%+, on "tail" (IP, small companies)—60–70%.
LLM for unrecognized. Low-confidence classifier predictions (confidence < 0.7) routed to LLM. GPT-4o-mini, temperature=0, single prompt with categories and examples—decision in 300–500ms, accuracy on non-standard names 80–90%.
# Server-side categorization pipeline
async def categorize_transaction(transaction: Transaction) -> CategoryResult:
# 1. Fast classifier
ml_result = classifier.predict(transaction.description)
if ml_result.confidence >= 0.75:
return CategoryResult(
category=ml_result.category,
confidence=ml_result.confidence,
method="ml_classifier"
)
# 2. LLM for uncertain predictions
llm_category = await llm_categorize(
description=transaction.description,
amount=transaction.amount,
merchant=transaction.merchant_name
)
return CategoryResult(
category=llm_category,
confidence=0.85, # LLM more confident in complex cases
method="llm_fallback"
)
Hybrid approach: 85–90% transactions processed by fast classifier (free), 10–15% by LLM. At 1,000 transactions/day per user, LLM costs—pennies.
Merchant Data Enrichment
Bank statement merchant names—dirty data. "MAGNIT COSMETIC 0001" and "MAGNIT KOSMETIK"—same merchant. Normalization via merchant databases (Clearbit, Plaid Enrich, or custom mapping) significantly improves classifier accuracy.
Additional signal—MCC code (Merchant Category Code) sent by bank with transaction. MCC 5411—grocery stores, MCC 5812—restaurants. Using MCC as classifier feature gives +5–10% accuracy.
AI Pattern Analysis on Top
Categorization—first step. AI analysis over categorized data—what transforms app from tracker to advisor.
// iOS — Swift: LLM request for monthly expense analysis
func generateExpenseInsights(transactions: [CategorizedTransaction]) async -> [Insight] {
let summary = transactions.groupBy(\.category)
.mapValues { txs in (count: txs.count, total: txs.map(\.amount).reduce(0, +)) }
.map { "\($0.key): \($0.value.total) RUB (\($0.value.count) transactions)" }
.joined(separator: "\n")
let prompt = """
Analyze user's monthly expenses and provide 2-3 specific observations.
Not generic advice—specific patterns from the data.
Expense breakdown:\n\(summary)
"""
let response = await llmClient.complete(prompt, maxTokens: 300, temperature: 0.4)
return parseInsights(response)
}
LLM sees: "Food delivery spending rose from 3,200 to 8,700 rubles vs last month" and generates specific insight, not generic "watch food spending".
Learning from User Corrections
Users correct wrong categories—gold for retraining. Each correction—new labeled example. After accumulating enough corrections (50–100 per user), personalized model can be fine-tuned or user-specific rules added:
// Android — save user correction
fun saveUserCorrection(transactionId: String, correctedCategory: Category) {
val correction = UserCorrection(
transactionDescription = getTransaction(transactionId).description,
merchantId = getTransaction(transactionId).merchantId,
correctedCategory = correctedCategory,
timestamp = System.currentTimeMillis()
)
localDatabase.saveCorrection(correction)
// Sync to server for retraining
syncService.scheduleCorrectionUpload(correction)
}
Timeframe Estimates
Rule-based classifier + MCC—3–5 days. ML classifier with LLM fallback—1–2 weeks. Complete system with pattern analysis, insights, learning from corrections—2–4 weeks.







