Skip to content

This project analyzes a dataset of student performance and creates a logistic regression to predict whether a student will pass or fail based on various factors. The models are evaluated. A new student is also predicted to pass or fail based on their information.

Notifications You must be signed in to change notification settings

Cbovell20/Predicting-Student-Performance-A-Machine-Learning

Repository files navigation

Predicting-Student-Performance-A-Machine-Learn

About Me:

I am a 2nd year economic student at the University of Waterloo, and on my journey to become a Data Scientist for my future career. I know Python, R, MYSQL, Tableau, and Excel. Currently, I have an interest in learning more machine learning and deep learning models to implement into new projects I develop.

Additional information on me:

  • University Track Athlete
  • Governance Analyst at the OLG for my first co-op
  • Love Reading and writing
  • technical analysis and fundemental analysis for stocks

I hope to develop and become better! So enjoy my projects, be sure to visit my LinkedIn down below as well.

Project Overview

  • Overview of the project: In this project, we will analyze the "Students Performance in Exams" dataset to identify factors that impact student performance in exams.

  • Motivation behind the project: Understanding factors that impact student performance can help educators identify areas where students need additional support and improve their academic outcomes.

  • Objectives of the project: Our objectives are to analyze the dataset using statistical and machine learning techniques to identify key factors that influence student performance.

OUTLINE FOR THE PROJECT INCLUDES:

Data Collection and Preprocessing

  • Description of the "Students Performance in Exams" dataset: This is a dataset of student performance in math, reading, and writing exams, along with demographic and other factors that may impact student performance.
  • Downloading the dataset from Kaggle: We will obtain the dataset from the Kaggle website.
  • Exploring the dataset: We will examine the structure of the data, identify missing values, and perform data cleaning and feature engineering as necessary.
  • Handling missing and duplicate data: We will use various methods to handle missing and duplicate data, such as dropping rows, filling in missing values, and removing duplicate entries.
  • Handling outliers: We will examine the distribution of data and use methods such as z-score or Tukey's method to identify and remove outliers.
  • Converting data into the appropriate format: We will transform categorical variables into numerical variables and convert data into the appropriate data types for analysis.

Descriptive Statistics

  • Mean, Median, Mode: Calculate the measures of central tendency to describe the distribution of the data.
  • Range, Variance, Standard Deviation: Calculate the measures of dispersion to describe the spread of the data.
  • Skewness, Kurtosis: Calculate the measures of shape to describe the symmetry and peakedness of the distribution of the data.
  • Percentiles, Quartiles: Calculate the measures of position to describe the location of the data relative to the rest of the - data.

Visualization

  • Scatter plots
  • Bar charts
  • Heat maps

Conclusion

  • Summarize the findings of the analysis
  • Discuss the potential impact of the project
  • Identify areas for future work and improvements

About

This project analyzes a dataset of student performance and creates a logistic regression to predict whether a student will pass or fail based on various factors. The models are evaluated. A new student is also predicted to pass or fail based on their information.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published