GitHub - rohit-chandra/taxi_demand_predictor_old: End-to-end ML project that predicts taxi demand in NYC

Taxi Demand Predictor Service 🚕

This repo is aimed at making it easy to start playing and learning about MLOps.
My interest in creating this project was ignited after reading UBER's blog post on (:link: Demand and ETR Forecasting at Airports)

Quick Setup

curl -sSL https://install.python-poetry.org | python3 -

cd into the project folder and run
```
$ poetry install
```
Activate the virtual env that you just created with
```
$ poetry shell
```

Problem Statement

You work as a data scientist 👨‍🔬👩‍🔬 in a ride-sharing app company 🚗 (e.g. Uber)
Your job is to help the operations team keep the fleet as busy as possible.

Supply 🚕 and demand 👨‍💼

Data Processing

Step 1 - Data Validation ✔️ ❎

Step 2 - Raw data into time-series data

Step 3 - Time-series data into (features, target) data

Step 4 - From raw data to training data

Step 5 - Explore and visualize the final dataset

Model training

MLOps

Batch-scoring system 🤹

It is a sequence of steps of computing and storage that map recent data to predictions that can be used by the business

Step 1 - Prepare data

First pipeline - Data Preparation pipeline or Feature pipeline - This component runs every hour
For eg: every hour, we extract raw data from an external service - from a data warehouse or wherever the recent data is
Once we fetch raw data, we then create a tabular dataset with features and target and store them in the feature store
This is the Data Ingestion Pipeline

Step 2 - Train ML Model

2nd pipeline - Model Training pipeline
Retrain the model since ML models in real-world systems are trained regularly
In this project, It's on-demand, whenever I think I want to train the model, I can trigger this pipeline, and it automatically trains, generate a new model and save it back to the model registry

Step 3 - Generate predictions on recent data

3rd pipeline - Prediction pipeline
USe most recent features and current model we have in production to generate predictions

Serverless MLOps tools

Hopsworks as our feature store
- It's a serverless platform that provides an infrastructure to manage and run the feature store automatically
- It's easy to manage unlike GCP, Azure where we have to setup different components first
Github Actions to schedule and run jobs
- We automate the feature pipeline that will ingest data every hour
- The notebook is going to automatically run every hour and it's going to fetch a batch of recent data, transform it and save it into features store
- Created a configuration yaml file under .github/workflows
- The cron job runs every hour
- The command below triggers the notebook execution from command line

poetry run jupyter nbconvert -to notebook -- execute notebooks/12_feature_pipeline.ipynb

Feature Store

Feature store is used to store features.
These features can be used to either train the models or make predictions.
Features saved in the feature store are:
- pickup_hour
- no_of_rides
- pickup_location_id

Backfill the Feature Store

Fetch files from the year 2022
Transform raw data into time series data
Dump it in the feature store
Repeat for the year 20223 and so on

Live Demo

work in progress

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
notebooks		notebooks
readme_pics		readme_pics
src		src
.gitignore		.gitignore
Project.pdf		Project.pdf
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Taxi Demand Predictor Service 🚕

Table of Contents 📑

Quick Setup

Problem Statement

Supply 🚕 and demand 👨‍💼

Data Processing

Model training

MLOps

Batch-scoring system 🤹

Live Demo

About

Releases

Packages

Languages

rohit-chandra/taxi_demand_predictor_old

Folders and files

Latest commit

History

Repository files navigation

Taxi Demand Predictor Service 🚕

Table of Contents 📑

Quick Setup

Problem Statement

Supply 🚕 and demand 👨‍💼

Data Processing

Model training

MLOps

Batch-scoring system 🤹

Live Demo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages