Skip to content

Latest commit

 

History

History
91 lines (62 loc) · 5 KB

README.md

File metadata and controls

91 lines (62 loc) · 5 KB


Twitter Sentiment Analysis

Binary classification experiments for the Twitter dataset

Notebook viewer

‼️ Because of memory restrictions, GitHub and Browsers can't open always big jupyter notebooks. For this reason I have every notebook linked with the ✔️ jupyter nbviewer ✔️ in the following table. If you have any problems opening the notebooks, follow the links.

Notebook Link to jupyter nbviewer Link to Colab
BiRNN_LSTM_GRU-BestModel.ipynb nbviewer Open In Colab
BiRNN_LSTM_GRU-Experiments.ipynb nbviewer Open In Colab
FeedForwardNN_GloVe.ipynb nbviewer Open In Colab
FeedForwardNN_TfiDf.ipynb nbviewer Open In Colab
LogisticRegression.ipynb nbviewer Open In Colab

Logistic regression

Developed a sentiment classifier using logistic regression for the Twitter sentiment classification dataset available in this link. I used the toolkit Scikit-Learn again.

Vectorization: Tf-Idf

Tf-Idf vectorization of the tweets. No pretrained vectors

Evaluation

Model metrics for evaluation: F1 score, Recall and Precision

Visualization: Confusion matrices


Feed-Forward Neural Net

Developed two sentiment classifier using feed-forward neural (pyTorch) networks for the Twitter sentiment analysis dataset.

Experimented with:

  • the number of hidden layers, and the number of their units
  • the activation functions (only the ones presented in the lectures)
  • the loss function
  • the optimizer, etc

Vectorization-1: Tf-Idf

Tf-Idf vectorization of the tweets. No pretrained vectors

Vectorization-2: Pre-trained word embendding vectors - GloVe

Vectorization made with GloVe (Stanford pre-trained embenddings)

Evaluation

Model metrics for evaluation: F1 score, Recall and Precision

Visualization: ROC curves, Loss vs Epochs, Accuracy vs Epochs and Confusion matrix


Bidirectional stacked RNN with LSTM/GRU cells

Experimented with:

  • the number of stacked RNNs,
  • the number of hidden layers,
  • type of cells,
  • skip connections,
  • gradient clipping and
  • dropout probability

Used the Adam optimizer and the binary cross-entropy loss function and transformed the predicted logits to probabilities using a sigmoid function.

Vectorization: GloVe

Pre-trained word embeddings (GloVe) as the initial embeddings to input on models.

Evaluation

Model metrics for evaluation: F1 score, Recall and Precision

Visualization: ROC curves, Loss vs Epochs, Accuracy vs Epochs and Confusion matrix


© Konstantinos Nikoletos | 2020 - 2021