`Curriculum Vitae Clustering using k-means`

A repository of datasets of curriculum vitaes and an example of clustering curriculum Vitae/resume using k-means. If interest is only on datasets, read this README.md file.

Requirement:

Description:

data/word directory contains the word(.docx) files of Curriculum Vitae.
resource/common_cities_state_countries_names.txt files contains the common cities, states, countries names. The main routine includes the contains of this file as stopwords.
resource/human_names.txt contains common human names. The main routine includes the contains of this file as stopwords.
resource/specific_stopwords.txt contains some user defined stopwords. The main routine includes the contains of this file as stopwords.
The main routine folderize the docx files of same cluster in a new folder with in output\ directory . The name of new directory is the first 15 most frequent features of that cluster.

Run:

python main.py for clustering docx file in data/word
python clean_resources.py to remove duplicate words in resource/human_names.txt & resource/common_cities_state_countries_names.txt files

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
curriculum_vitae_data @ a6fb436		curriculum_vitae_data @ a6fb436
resource		resource
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
clean_resources.py		clean_resources.py
get_top_n_words.py		get_top_n_words.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`Curriculum Vitae Clustering using k-means`

About

Releases

Packages

Languages

arefinnomi/Curriculum-Vitae-Clustering-using-KMeans

Folders and files

Latest commit

History

Repository files navigation

Curriculum Vitae Clustering using k-means

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`Curriculum Vitae Clustering using k-means`

Packages