Topic_modeling_with_gensim

Gensim is Licensed under GNU LGPLv2.1 which requires modifications to Gensim (if any and if distributed to others) to be open sourced. For more info : https://radimrehurek.com/gensim/about.html

What is this repository for?

Quick summary: Topic Modeling on collection of documents or texts fetched from database in python3 using Gensim library. The current version writes CSVs as output under 'output' directory in project. One csv for retrieved topics along with corresponding words and another csv containing topic distribution for raw texts (unseen/new documents can be used). Future version will have word cloud representing prominent words for each topic.

NOTE: Delete contents of data folder (which will store dictionary and corpus) if texts or documents used for training model are changed.

Version : 1.1 Next version will use word cloud to display prominent words for each topic.

How do I get set up?

Summary of set up: Basic setup (I used Pycharm IDE community version) with required modules installed.

Main_file.py is the main file of the project.

Utils.py contains utility methods such as reading config sections from config.properties.

Configure config.properties file to connect to database and modify Sql query accordingly. (postgres as of now, will add mysql connectivity in later versions)

Configuration: Config file includes sections for database, sql query and LDA model parameters.

Dependencies: Gensim, psycopg2, nltk and stop_words (use pip to install gensim, psycopg2 and stop_words. Google how to download nltk)

Database configuration: DB_Section in config.properties file under conf directory

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Src		Src
Utils		Utils
conf		conf
.gitignore		.gitignore
Definitions.py		Definitions.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Topic_modeling_with_gensim

What is this repository for?

How do I get set up?

About

Releases

Packages

Languages

manu-chauhan/Topic_modeling_with_gensim

Folders and files

Latest commit

History

Repository files navigation

Topic_modeling_with_gensim

What is this repository for?

How do I get set up?

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages