TravelWatchAI

ML Pipeline, Decision Agent, Streamlit App

Year: 2026

Technology: Python / scikit-learn / Streamlit / pandas

Categories: ML, Decision Agent, Course Project (Cornell Tech INFO 5368)

Should You Buy That Flight Ticket Now — or Wait?

TravelWatchAI is an end-to-end machine learning system that predicts flight ticket prices and issues a binary BUY / WAIT decision for travelers. Built as a course project for Cornell Tech INFO 5368, it trains and benchmarks multiple regression and classification models on a 300K-row Kaggle flight dataset, then deploys the best-performing combination as an interactive Streamlit web app.

The preprocessing pipeline drops missing values, label-encodes categorical fields (airline, source city, class), min-max normalises numeric features, and removes price outliers using both IQR and Z-score filtering — ensuring all models train on clean, comparable data.

Regression models predict the continuous ticket price; classification models then consume those predictions to label each flight BUY (price is low relative to historical patterns) or WAIT (price is likely to drop). BUY/WAIT labels are generated programmatically from the price distribution, removing the need for manual annotation.

Model selection is principled rather than purely metric-driven: when Ridge and Lasso perform comparably, Ridge is preferred for its stability; when Logistic Regression and KNN are close, Logistic Regression wins on interpretability. This bias is baked into the agent's selection logic.

Streamlit Prediction Interface

Model Performance Dashboard

Two-Stage ML Pipeline

Stage 1 — Price Regression
Four regression models are trained and evaluated side-by-side on MSE, RMSE, MAE, and R². Ridge Regression is selected as the production model: it matches Lasso on error metrics while being less sensitive to multicollinearity in the encoded flight features.

Linear Regression Polynomial Regression Ridge ✓ Lasso

Stage 2 — BUY / WAIT Classification
Ridge-predicted prices are thresholded against route-level historical distributions to generate BUY/WAIT labels. Two classifiers are trained and compared on F1 score and AUC-ROC. Logistic Regression is chosen over KNN: comparable accuracy, faster inference, and interpretable coefficients that explain which features drive the decision.

Logistic Regression ✓ KNN

Regression Model Comparison

AUC-ROC Curve

BUY / WAIT Decision Output