I am a 2nd year economic student at the University of Waterloo, and on my journey to become a Data Scientist for my future career. I know Python, R, MYSQL, Tableau, and Excel. Currently, I have an interest in learning more machine learning and deep learning models to implement into new projects I develop.
Additional information on me:
- University Track Athlete
- Governance Analyst at the OLG for my first co-op
- Love Reading and writing
- technical analysis and fundemental analysis for stocks
I hope to develop and become better! So enjoy my projects, be sure to visit my LinkedIn down below as well.
- LinkedIn: https://www.linkedin.com/in/cbovell/
- Email: [email protected]
-
Overview of the project: In this project, we will analyze the "Students Performance in Exams" dataset to identify factors that impact student performance in exams.
-
Motivation behind the project: Understanding factors that impact student performance can help educators identify areas where students need additional support and improve their academic outcomes.
-
Objectives of the project: Our objectives are to analyze the dataset using statistical and machine learning techniques to identify key factors that influence student performance.
- Description of the "Students Performance in Exams" dataset: This is a dataset of student performance in math, reading, and writing exams, along with demographic and other factors that may impact student performance.
- Downloading the dataset from Kaggle: We will obtain the dataset from the Kaggle website.
- Exploring the dataset: We will examine the structure of the data, identify missing values, and perform data cleaning and feature engineering as necessary.
- Handling missing and duplicate data: We will use various methods to handle missing and duplicate data, such as dropping rows, filling in missing values, and removing duplicate entries.
- Handling outliers: We will examine the distribution of data and use methods such as z-score or Tukey's method to identify and remove outliers.
- Converting data into the appropriate format: We will transform categorical variables into numerical variables and convert data into the appropriate data types for analysis.
- Mean, Median, Mode: Calculate the measures of central tendency to describe the distribution of the data.
- Range, Variance, Standard Deviation: Calculate the measures of dispersion to describe the spread of the data.
- Skewness, Kurtosis: Calculate the measures of shape to describe the symmetry and peakedness of the distribution of the data.
- Percentiles, Quartiles: Calculate the measures of position to describe the location of the data relative to the rest of the - data.
- Scatter plots
- Bar charts
- Heat maps
- Summarize the findings of the analysis
- Discuss the potential impact of the project
- Identify areas for future work and improvements