Do riders behave differently on weekends compared to weekdays? Using millions of trip records from Bike Share Toronto, I analyzed how ridership patterns and trip duration change on weekends.
Instead of relying on naïve averages, I applied causal inference techniques to isolate the true weekend effect while controlling for confounders such as weather proxies (hours, season), user type, and station distribution.
This project demonstrates real-world causal inference workflows, including propensity score modeling, inverse probability weighting (IPW), double machine learning (DML), and Causal Forests.
Problem
Weekend ridership often differs from weekday behavior due to user intent (commuting vs leisure), but raw differences are confounded by seasonality, trip locations, and rider demographics.
Business question:
How much does a weekend actually increase trip duration after adjusting for confounders?
And does this effect vary across rider segments (CATE)?
A reliable causal estimate can help transportation teams:
- anticipate demand for staffing and rebalancing,
- adjust pricing strategies (weekend premiums/discounts),
- design targeted engagement campaigns.
Approach
Data Preparation
- Processed ~5M ride records and engineered features:
- trip duration (min),
- hour of day,
- weekday/weekend indicator,
- season (derived from file date),
- user type,
- station IDs.
Exploratory Analysis
- Explored demand trends across hour, day-of-week, and season.
- Identified strong weekend patterns: leisure spikes, late-evening demand, and longer average trip duration.
Causal Inference Workflow
1. Naïve Estimation
Compared mean duration between weekends and weekdays.
Result: Weekend trips appear ~2.8 minutes longer, but this is biased due to confounders.
2. Propensity Score Modeling
- Modeled the probability of being a weekend trip using:
- One-Hot Encoded day-of-week,
- user type,
- season,
- hour,
- station distributions.
- Checked overlap between propensity distributions to ensure a valid comparison.
3. IPW (Inverse Probability Weighting)
Weighted observations by inverse propensity scores.
Effect: ~2.7 minutes increase – confirming naive estimates but now properly adjusted.
4. Double Machine Learning (DML) with EconML
Used Gradient Boosting + Logistic Regression to flexibly model outcomes and treatment assignment.
ATE estimate:
Weekend increases trip duration by ~2.43 minutes (after controlling for all confounders)
DML provides a more robust, orthogonalized estimate suitable for high-dimensional features.
5. CATE (Conditional Average Treatment Effect)
Computed subgroup effects:
- Casual riders: +2.9 min
- Annual members: +0.6 min
- Summer: +3.0 min
- Winter: +1.4 min
Weekend effects are much stronger for casual riders and summer months, reflecting leisure behavior.
6. Causal Forests (Uplift Modeling)
Trained a Causal Forest (EconML’s CausalForestDML) to estimate heterogeneous treatment effects across individual riders.
Key insights:
- High-duration leisure riders show the strongest weekend uplift.
- Commuter-heavy stations show minimal weekend effect.
- Uplift visualization confirms meaningful heterogeneity across the city network.
Results
Overall Effect
- A consistent +2.3 to +2.7 minute increase in trip duration on weekends.
- Effect is robust across:
- IPW,
- DML,
- Causal Forest estimators.
Subgroup Differences
- Casual riders → highest uplift, driven by leisure purposes.
- Members → minimal change (commuters behave similarly across days).
- Summer months show the largest increase, aligning with warm-weather tourism and recreation.
Causal Forest Insights
- Reveals where and for whom the weekend effect is highest.
- Useful for targeted pricing or operational planning.
Impact
This analysis provides actionable insight for Bike Share operations teams:
- Predict weekend demand more accurately
(especially for casual riders and tourist-heavy seasons). - Optimize bike rebalancing strategies
by identifying stations with strong weekend duration uplift. - Support pricing or promotional decisions
by segmenting users based on causal effects. - Demonstrate analytic rigor
by using causal inference instead of misleading raw comparisons.
Skills & Tools
- Python (pandas, NumPy, matplotlib, seaborn)
- Causal Inference
Propensity Scores, IPW, Double ML, Causal Forests, CATE estimation - Machine Learning
Gradient Boosting, Logistic Regression, EconML, Scikit-Learn - Experimentation Concepts
Overlap checks, confounding adjustment, treatment effect heterogeneity - Data Engineering
Efficient preprocessing of millions of records
Check out the code on GitHub
