Article Sentiment Analyzer is a python program using tensorflow, keras, Natural Language Tool Kit (NLTK), and numpy to estimate the emotions present in an article. It breaks the articles down into their component sentences, then runs them through a deep learning model to predict the emotion present in each sentence, sums those emotions to determine the overall sentiment of the article, then normalizes the sentiment vector so that these results can be compared to other articles in the 6 dimensional emotion space (Anger, Fear, Sadness, Joy, Surprise, Love).
It provides 5 files to serve different purposes:
- main.py is used to train the model
- model is saved to the folder and files specified on line 63
- the word vectors used are common crawls 300 dimension, 2 Million word vectors, this can be edited on line 47
- Train, val, and test (only used for making tokenizer) data is specified on lines 18-20
- if changing training classes, line 21 must also be changed
- If model is not to be used with an NVIDIA GPU,
cuda_optimized_gru
should be set toFalse
on line 47- Note: Model runs and trains roughly 10x faster on NVIDIA GPUs with cuda enabled
- If the machine has too little memory (System or GPU) to train the model, reduce the
batch_size
on line 50 andgru_output_size
on helper.py line 64
- cli.py is used to estimate the emotion of individual statements, it really only exists for testing purposes
- predict_all_urls.py grabs the most recent reviews from ign.com and estimates the sentimens for all of them, it then outputs the most joyful one
- analyze_article_cli.py is a command line interface like cli.py but will take in article links instead of statements
- like in cli.py the model file should be specified on line 11
- Only ign articles are supported at this time but this is only due to the web scraper, it could be expanded to work with other sites
- avatar-the-way-of-water-review.svg and star-wars-knights-of-the-old-republic-review-2.svg are included as sample chart outputs from this
- test.py is used to anlyze the performance of the model
There are also 2 helper files:
- helper.py contains a number of functions that are used by multiple files
- ModelProcessing.py is a class to store the model, tokenizer, and other necessary data and has functions to read and write them to files for later use
model-cuda.yaml
and its realted files are contained in this repo so that the user can simply test it without having to train a model themselves. Note that this model was trained on an RTX2070 Super and may use more memory than some GPUs have and will likely run slowly on a CPU only. On my machine it runs at a rate of 608 predictions per second and an accuracy of 92.85%