TravelWatchAI is an end-to-end machine learning system that predicts flight ticket prices and issues a binary BUY / WAIT decision for travelers. Built as a course project for Cornell Tech INFO 5368, it trains and benchmarks multiple regression and classification models on a 300K-row Kaggle flight dataset, then deploys the best-performing combination as an interactive Streamlit web app.
The preprocessing pipeline drops missing values, label-encodes categorical fields (airline, source city, class), min-max normalises numeric features, and removes price outliers using both IQR and Z-score filtering — ensuring all models train on clean, comparable data.
Regression models predict the continuous ticket price; classification models then consume those predictions to label each flight BUY (price is low relative to historical patterns) or WAIT (price is likely to drop). BUY/WAIT labels are generated programmatically from the price distribution, removing the need for manual annotation.
Model selection is principled rather than purely metric-driven: when Ridge and Lasso perform comparably, Ridge is preferred for its stability; when Logistic Regression and KNN are close, Logistic Regression wins on interpretability. This bias is baked into the agent's selection logic.
Streamlit Prediction Interface
Model Performance Dashboard
Stage 1 — Price Regression
Four regression models are trained and evaluated side-by-side on MSE, RMSE, MAE, and R². Ridge Regression is selected as the production model: it matches Lasso on error metrics while being less sensitive to multicollinearity in the encoded flight features.
Linear Regression Polynomial Regression Ridge ✓ Lasso
Stage 2 — BUY / WAIT Classification
Ridge-predicted prices are thresholded against route-level historical distributions to generate BUY/WAIT labels. Two classifiers are trained and compared on F1 score and AUC-ROC. Logistic Regression is chosen over KNN: comparable accuracy, faster inference, and interpretable coefficients that explain which features drive the decision.
Logistic Regression ✓ KNN
Regression Model Comparison
AUC-ROC Curve
BUY / WAIT Decision Output