Skip to content

The final project for the course Algorithms for Massive Datasets, part of the Data Science and Economics masters degree at University of Milan.

Notifications You must be signed in to change notification settings

DrCohomology/Find-similar-documents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Find similar documents

Scalable pyspark implementation of an algorithm to retrieve similar documents in a corpus.

This project was submitted as final assignment for the Algorithms for Massive Data class, MsC in Data Science and Economics, University of Milan.

The notebook was run on google colab. Commenter privileges have been granted to anyone accessing the notebook via link

About

The final project for the course Algorithms for Massive Datasets, part of the Data Science and Economics masters degree at University of Milan.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published