Predicting eBay Delivery Times
Description
Using a dataset of 15 million entries from eBay, my team set out to try to predict delivery times in two segments: handling time and delivery time. We utilized a variety of machine learning methods (XGBoost and CatBoost Decision Trees as well as a Fully-Connected Neural Networks). This project was very heavy on the data cleaning and was difficult due to the enormous data size.
To create our neural network model and attain our results, we used a number of tools for the different stages of our project. To construct our model, we used Pytorch, sklearn, and Jupyter notebooks for most of our development. We ran our code on the Pomona High Performance Computing servers to utilize more GPU power. We used a linear regression model with regularization penalties, as well as XGBoost, to infer which features were the most important for predicting (a) the handling time, and (b) the shipment time. We also compared our findings to a fully connected network. Finally, we used CatBoost to identify the hyperparameters to fine tune for our final model. Ultimately, we used CatBoost for our final predictions as it outperformed the other models.
(I own no rights to the eBay logo)