Developing an AI system for predicting player churn
Churn prediction in the gaming industry is the foundation of retention marketing. Mobile games lose 70% of new players within the first seven days. A proper model allows you to identify players at risk of churn before they uninstall the app and bring them back through personalized retention campaigns.
Specifics of gaming churn
Definition of Churn: There's no "contract" in games. The black-and-white "left/stayed" label is determined by an inactivity threshold:
- Mobile games: 7-14 days without login
- MMORPG: 30 days without login
- Casual games: 3-5 days
The choice of threshold affects the balance of classes and timely intervention.
Early vs. Late churn:
- Early churn (D1-D7): onboarding issues, tutorial complexity, expectations not met
- Mid-term churn (D7-D30): falling interest in content, violation of progression
- Late churn (D30+): content exhaustion, burnout, emergence of competitive play
Each type requires a different return strategy.
Feature Engineering from Game Logs
Engagement:
engagement_features = {
'sessions_last_7d': session_count_7d,
'avg_session_length_min': avg_session_duration,
'session_frequency_trend': sessions_last_3d / sessions_prev_3d,
'days_since_last_session': recency,
'total_days_played': frequency,
'total_revenue': monetary, # RFM
# Game progress
'player_level': current_level,
'level_progression_rate': levels_gained_per_day,
'progression_delta': level_now - level_7d_ago,
'features_unlocked': count(unlocked_features),
# Social
'guild_membership': bool,
'friends_count': friend_list_size,
'pvp_matches_7d': pvp_count,
'chat_messages_7d': messages_count
}
Monetization:
monetization_features = {
'payer_flag': has_ever_paid,
'days_since_last_purchase': recency_purchase,
'ltv_to_date': total_revenue,
'purchase_count': total_transactions,
'avg_purchase_value': mean(transaction_values),
'subscription_active': bool,
'ad_views_7d': rewarded_ad_count # для free-to-play
}
Models by segment
Not one model for all – different ones for different segments:
- Payers: the most valuable segment. XGBoost with financial features. Threshold is lower—we can't afford to lose.
- High-engagement non-payers: potential conversions. LightGBM with engagement features.
- Casual players: the majority. Simpler model, high recall.
Cohort-aware model: Player behavior on D7 is normalized to the cohort (average D7 for a given acquisition channel, launch date):
features['d7_sessions_normalized'] = player_d7_sessions / cohort_avg_d7_sessions
This removes seasonality and differences between cohorts.
Survival Analysis for Games
An alternative formulation: not "will it leave within 14 days," but "how many days until the outflow":
from lifelines import WeibullAFTFitter
# AFT model: accelerated lifetime
aft = WeibullAFTFitter()
aft.fit(player_data, duration_col='days_until_churn', event_col='churned')
# Median time to churn for a specific player
predicted_retention = aft.predict_median(player_features)
This gives a more nuanced signal: not just risk, but expected lifespan.
Retention Actions
D0-D3 — Tutorial intervention: If tutorial completion < 80% → push notification with simplified help.
D1-D7 — Progression intervention: If progress is below the cohort median → temporary buff or resource gift.
D7-D30 — Engagement intervention:
- High-risk paying: personal email from the "developers" with a unique bonus
- High-risk freemium: retargeting advertising campaign with deep linking to the game
Win-back (after leaving):
- Email/push after 3/7/14/30 days of inactivity
- Special "We miss you!" offers
- Announcement of new content via push notifications
Lift Measurement и A/B
# Incrementality test
treatment = high_risk_players.sample(frac=0.5)
control = high_risk_players.drop(treatment.index)
# In 14 days
treatment_retention = treatment[treatment.is_active_14d_later].shape[0] / len(treatment)
control_retention = control[control.is_active_14d_later].shape[0] / len(control)
uplift = treatment_retention - control_retention
print(f"Retention uplift from intervention: {uplift:.1%}")
Timeframe: Basic churn model with logs (LightGBM) – 3-4 weeks. A full-fledged system with a cohort-aware approach, segmented models, survival analysis, and A/B measurement – 3-4 months.







