GitHub - sharnam19/Document-Classification-Using-RNN: RNN Model based Document Classification

#Document Classifier Using Vanilla Recurrent Neural Network

The Dataset consists of Textual Data that belong to one of the 8 Categories

Data Preprocessing Steps

Tokenize The Sentences
Remove Tokens That Are Stop Words
If the number of words in the Document is Greater Than 20 then retain only the First 20 Words of the sentence.
Convert Every Word to its Unique Identifcation Number
Now for the sentences with number of words Less Than 20, pad the sentence with 0's

Had to Fix the Sequence Length to 20 and Pad with 0's to make Training Faster

Learning Rate = 1e-2
Epochs = 4050
Word Embedding Dimension = 100
Hidden State Dimension = 128
Truncated Backpropagation Length = 4
Training Sequence Length = 20
Batch Size = 1000
Weight Initialization was done from a Gaussian Distribution with mean=0.0 and std=1
Bias were Zero Initialized

Test Set Accuracy = 74.22%
The training time for the model was about 6 hours

The Model can be downloaded from here

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
.gitignore		.gitignore
README.md		README.md
layer.py		layer.py
loss.py		loss.py
loss_curve.png		loss_curve.png
model.py		model.py
plot.py		plot.py
preprocess.py		preprocess.py
sequential.py		sequential.py