Skip to content

Time Series data analysis for the Euromilhões lottery premiations for 2020 (Covid year)

License

Notifications You must be signed in to change notification settings

nandoabreu/euromilhoes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

euromilhoes

Time Series data analysis for the Euromilhões lottery draws

This project develops an exercise from an assignment for the Time Series and Forecast, from the Data Science M.Sc. course, from UP - Universidade do Porto (Porto University, Portugal).

The assignment

The project is described in this page

The project

The project was developed since the extraction of the data, up to developing the analysis and reports. The data is about the draws for the Portuguese part os the EuroMillions draw.

Technology

This project was developed using Python v3.10 and several Python libraries running locally over a virtual environment managed by Python Poetry. The GNU/Linux OS is the core for the used distribution, Ubuntu v23.

To run this project's code, setup Python Poetry and install the project.

Data extraction

The dataset for the Euromilhões draw is not directly available, but the data can be fetched through HTTP. It is available throught the Portal Jogos Santa Casa website.

A webscraping module was built using Python and libraries available for the programming language. The code for this module can be found in the setup directory.

Python libraries

For the data extraction module, along with some built-in libraries, a list of third-parties libraries were used. The main ones are listed bellow:

Instructions

A Makefile is available with recipes. Check the file or simply run the following to extract, parse and store:

make extract-raw-data && make parse-and-store

414.8 MB are fetched when scrapping all data; 1600+ html files will be stored from the extraction

Data load and Plots creation

Data loading is part of the main Python module. The stored data is loaded and feeds the system to generate reportas and plots. Check the data directory for the dataset, created plots and reports.

Python libraries

For the data visualization module, along with some built-in libraries, a list of third-parties libraries were used. The main ones are listed bellow:

Instructions

A Makefile is available with recipes. Check the file or simply run the following to crete the plots:

make create-plots

About

Time Series data analysis for the Euromilhões lottery premiations for 2020 (Covid year)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published