Skip to content

Soccer match predictions using machine learning techniques, including data preprocessing and cleanup, feature extraction and derivation, and model optimization. Done as a final project for CS4840 - Intro to Machine Learning.

Notifications You must be signed in to change notification settings

aliAljaffer/soccer-match-predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Soccer Match Predictor

A machine learning model using Random Trees Embedding with Extra Trees. See ./docs, Project presentation, Project report

Abstract

A look into the possibility and viability of predicting the winner of a Soccer match involving two international teams by leveraging machine learning to make accurate predictions based on features involving game performance of the two teams. Logistic Regression (LR), Support Vector Machine (SVM), and Gradient Boosting (GB) were all used as a baseline to make predictions to then be compared to the main chosen model, Random Trees Embedding with Extra Trees Classifier (RTE). The RTE proved better than the baseline models, achieving an accuracy of 0.6453 and an average F1-Score of 0.65. Followed by GB which achieved an accuracy of 0.5445 and an average F1-Score of 0.54. Followed by LR and SVM, which achieved similar results for accuracy 0.4925±0.0005 and an F1-Score average of 0.483±0.003.

Keywords: Soccer predictions, Multi-class Classification, Random Trees Embedding, Random Forest, Feature Extraction, Data Resampling.

Included Files

  • ./docs/Ali-Aljaffer-final.ipynb: The jupyter notebook containing the code for the project
  • datasets folder: Contains the datasets used by the FinProject.ipynb
    • results.csv: International Match Results dataset
    • fifa_ranking-2023-07-20.csv: FIFA rankings dataset
    • rank_per_yr_T_sorted.csv: A generated CSV from fifa_ranking-2023-07-20.csv that has a row for each country and its columns are the year and the points of the year.
      • Can be generated by running create_rank_at_year.py
  • create_rank_at_year.py: A script I used to extract information from the rankings dataset to make it more useful.
  • ./docs/Ali-Aljaffer-final.pptx: The presentation file
  • ./docs/Ali-Aljaffer-FinalProjectReport.pdf: The full report file

Dataset Sources

About

Soccer match predictions using machine learning techniques, including data preprocessing and cleanup, feature extraction and derivation, and model optimization. Done as a final project for CS4840 - Intro to Machine Learning.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages