AI-Powered Borrower Scoring for FinTech Mobile Applications
Traditional credit scoring via NBKI/OKB works with history: no credit history—no score. AI scoring adds alternative signals: in-app behavior, transactional patterns, indirect socioeconomic markers. For fintech apps with their own wallet or credit product, this enables working with audiences traditional banks ignore.
Alternative Scoring Data Sources
Only data explicitly approved by users (consent required, data protection law):
Transactional patterns. Regular income flow (salary vs chaotic), income-to-expense ratio, month-end balance, credit vs debit instrument usage. Most reliable source—data from own app, manipulation is difficult.
Behavioral signals. App usage frequency, session completion rate (user opened app and completed action vs just opened), use of long-term planning features. These markers correlate with financial discipline but weaker than transactional patterns.
Sociodemographic markers. Region, device type (indirect income signal), app usage tenure. Extreme caution required: model must not discriminate based on legally protected categories.
ML Pipeline Architecture
Scoring model lives on server—no on-device models for this task. Mobile app collects and sends events, server computes score on request.
# Feature engineering pipeline — server
import pandas as pd
from sklearn.preprocessing import StandardScaler
import lightgbm as lgb
def extract_features(user_id: str, window_days: int = 90) -> dict:
transactions = db.get_transactions(user_id, days=window_days)
df = pd.DataFrame(transactions)
return {
# Income stability
"income_regularity": df[df.amount > 0].amount.std() / df[df.amount > 0].amount.mean(),
# Expense to income ratio
"expense_to_income_ratio": abs(df[df.amount < 0].amount.sum()) / df[df.amount > 0].amount.sum(),
# Days with negative balance
"negative_balance_days": calculate_negative_balance_days(df),
# Month-end balance stability
"month_end_balance_stability": calculate_eom_balance_stability(df),
# Number of unique income sources
"income_source_diversity": df[df.amount > 0].merchant.nunique(),
# Average days between transactions
"avg_days_between_transactions": df.timestamp.diff().dt.days.mean(),
}
def predict_score(user_id: str) -> dict:
features = extract_features(user_id)
feature_vector = pd.DataFrame([features])
score = model.predict(feature_vector)[0] # LightGBM, xgboost or CatBoost
probability = model.predict_proba(feature_vector)[0][1]
return {
"score": int(score * 1000), # 300–850, FICO analogue
"probability_of_default": float(probability),
"confidence": calculate_confidence(features),
"feature_contributions": get_shap_values(feature_vector) # Explainability
}
Score Explainability (XAI)
Central banks and regulatory trends require credit decision explainability. SHAP (SHapley Additive exPlanations)—standard for explaining tree-based model decisions. SHAP result: "Score affected by: income stability (+120 points), high entertainment spending (−45 points), young app tenure (−30 points)".
Must show users this in mobile app when credit denied—not technically, but in human language:
// iOS — translate SHAP values to user-friendly text
func localizeScoreExplanation(_ contributions: [FeatureContribution]) -> [String] {
return contributions.sorted { abs($0.value) > abs($1.value) }
.prefix(3)
.map { contribution in
switch contribution.feature {
case "expense_to_income_ratio" where contribution.value < 0:
return "High spending relative to income"
case "income_regularity" where contribution.value > 0:
return "Stable regular income"
case "negative_balance_days" where contribution.value < 0:
return "Periods with insufficient balance"
default:
return contribution.defaultDescription
}
}
}
Model Retraining and Quality Monitoring
Model degrades over time—economic conditions change, user patterns shift. Required:
- Population Stability Index (PSI)—monitor feature drift. PSI > 0.25 signals retraining need
- Gini coefficient on fresh data—monthly check of model discriminative power
- Retrospective analysis of predictions after 90 days (default confirmation period)
Compliance and Restrictions
Scoring model must not contain protected features—gender, nationality, religion, birthplace. Pre-production audit for disparate impact: check model doesn't indirectly discriminate against demographic groups via proxy features. Fairness testing: fairlearn or aequitas.
Data storage: Russian personal data—only on RF servers (data protection law). Transactional data for scoring—not shared with third parties without separate consent.
Development Process
Audit available data and obtain legal opinion → design feature space → develop ETL pipeline → train baseline model (logistic regression as benchmark) → gradient boosting with tuning → SHAP explanations → A/B test vs baseline → production monitoring.
Timeframe Estimates
MVP with logistic regression and basic transactional features—3–4 weeks. Production system with LightGBM, SHAP, monitoring, compliance audit—2–3 months. Without ready data pipeline—add 2–4 weeks for events collection and storage development.







