
Tech Stack
Databricks
Machine Learning
Agile
Project management
Description
This project focuses on building a data-driven machine learning pipeline to predict rail break risk from historical operational and condition data. I worked end-to-end across the analytics workflow: cleaning and validating raw datasets, designing informative features, training and tuning predictive models, and evaluating performance under a competitive leaderboard setting.
The key challenge was transforming noisy, real-world rail data into stable signals for prediction. Through iterative experimentation, systematic hyperparameter tuning, and careful evaluation, the final solution achieved strong leaderboard performance (Rank #2 in the recorded snapshot), demonstrating both model effectiveness and disciplined ML engineering practice.
- Built an end-to-end ML pipeline: data cleaning, preprocessing, feature engineering, training, and evaluation
- Experimented with and compared multiple models (baseline → advanced), selecting the best performer via metric-driven benchmarking
- Evaluated performance with Accuacy, F1-score and AUC-PR, emphasizing imbalanced-class robustness
- Applied explainability analysis like SHAP, DICE to interpret feature importance and support model validation
- Wroked as a scrum master for one sprint
Page Info
Rank

Submit History
