Time Series data analysis for the Euromilhões lottery draws
This project develops an exercise from an assignment for the Time Series and Forecast, from the Data Science M.Sc. course, from UP - Universidade do Porto (Porto University, Portugal).
The project is described in this page
The project was developed since the extraction of the data, up to developing the analysis and reports. The data is about the draws for the Portuguese part os the EuroMillions draw.
This project was developed using Python v3.10 and several Python libraries running locally over a virtual environment managed by Python Poetry. The GNU/Linux OS is the core for the used distribution, Ubuntu v23.
To run this project's code, setup Python Poetry and install the project.
The dataset for the Euromilhões draw is not directly available, but the data can be fetched through HTTP. It is available throught the Portal Jogos Santa Casa website.
A webscraping module was built using Python and libraries available for the programming language. The code for this module can be found in the setup directory.
For the data extraction module, along with some built-in libraries, a list of third-parties libraries were used. The main ones are listed bellow:
A Makefile is available with recipes. Check the file or simply run the following to extract, parse and store:
make extract-raw-data && make parse-and-store
414.8 MB are fetched when scrapping all data; 1600+ html files will be stored from the extraction
Data loading is part of the main Python module. The stored data is loaded and feeds the system to generate reportas and plots. Check the data directory for the dataset, created plots and reports.
For the data visualization module, along with some built-in libraries, a list of third-parties libraries were used. The main ones are listed bellow:
A Makefile is available with recipes. Check the file or simply run the following to crete the plots:
make create-plots