How do music platforms keep users subscribed in a competitive streaming market?
Using the KKBox churn dataset, I combined machine learning and causal inference to explore not just who is likely to churn, but why.
Problem
Subscription churn is a major revenue challenge for streaming platforms.
The business question: Can we predict which users are at risk, and identify actionable drivers of churn?
Approach
- Data Preparation – Cleaned & merged demographics, transactions, and listening logs (~1M users).
- Modeling – Logistic Regression (baseline, weak) → XGBoost (best, AUC ≈ 0.98).
- Explainability – SHAP values to interpret feature importance.
- Causal Inference – Propensity scores (Local Effect Analysis) + Mediation analysis to test if auto-renewal truly prevents churn.
Results
- Prediction: XGBoost accurately identified high-risk churners.
- Explainability: Auto-renewal and inactivity gaps ranked as top predictors.
- Causality: Engagement (listening activity, gap days) directly reduced churn;
auto-renewal was predictive but not causal.
Impact
- Predictive models enable early churn detection for proactive outreach.
- Causal insights show engagement is the lever to target, not billing settings.
- Recommendation: focus retention strategy on shortening activity gaps and boosting listening habits.
Skills & Tools
- Python (pandas, NumPy, scikit-learn, XGBoost, statsmodels, SHAP)
- Data cleaning & feature engineering
- Model explainability
- Causal inference (propensity scores, IPW, mediation analysis)
- Visualization (matplotlib, seaborn)
