Skip to content

Taxi Fare predictor using Random Forest and GridSearchCV for Hyperparameter Tuning

Notifications You must be signed in to change notification settings

abhi23shek/TaxiFarePredictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Taxi Fare Predictor

Project Overview

The Taxi Fare Predictor project aims to predict the fare of a taxi ride in New York City based on various features such as pickup and dropoff locations, the number of passengers, and the date and time of the ride. The project uses a machine learning model to perform the predictions, with a focus on data preprocessing, feature engineering, and model training.

Dataset

The dataset used in this project is TaxiFare.csv, which includes the following columns:

  • unique_id: Unique identifier for each ride (dropped during preprocessing)
  • amount: The fare amount for the ride
  • date_time_of_pickup: The date and time when the ride was initiated
  • longitude_of_pickup: The longitude of the pickup location
  • latitude_of_pickup: The latitude of the pickup location
  • longitude_of_dropoff: The longitude of the dropoff location
  • latitude_of_dropoff: The latitude of the dropoff location
  • no_of_passenger: The number of passengers in the ride

Requirements

The following Python libraries are required to run the notebook:

  • pandas
  • numpy
  • matplotlib
  • seaborn
  • scikit-learn

You can install the necessary libraries using:

pip install pandas numpy matplotlib seaborn scikit-learn

Project Structure

The project is structured as follows:

  1. Data Loading and Initial Exploration:

    • Import necessary libraries.
    • Load the dataset and perform initial exploration.
  2. Data Cleaning and Preprocessing:

    • Drop unnecessary columns.
    • Check the dataset's shape and data types.
  3. Feature Engineering and Model Training:

    • Perform feature engineering to prepare data for model training.
    • Train a RandomForestRegressor model to predict taxi fares.

Running the Project

  1. Clone the repository or download the TaxiFarePredictor.ipynb notebook.
  2. Ensure you have the required libraries installed.
  3. Run the notebook cells sequentially to load the data, preprocess it, and train the model.
  4. The model's performance metrics will be displayed towards the end of the notebook.

Conclusion

This project demonstrates how to build a machine learning model to predict taxi fares using various features. It involves data cleaning, preprocessing, feature engineering, and model training using scikit-learn's RandomForestRegressor.

Future Work

  • Explore additional features that could improve the model's accuracy, such as weather conditions or traffic data.
  • Experiment with other machine learning algorithms and compare their performance.
  • Deploy the model as a web service for real-time fare prediction.

Author

Abhishek Kumar


Feel free to customize the README further based on additional details or personal preferences.