Dear Reader and Judge,
This document explains how to run the Protego code and how to access the prototype demo.
The "Protego_backend_alpha.py" script plays a role in the backend of PROTEGOs dashboard for trend analysis and reputation management.
The prototype demo can be seen at: http://www.robots.ox.ac.uk/~favour/protego
Method: GradientBoostingClassifier
Why: It performed the best in our parameter analysis.
Others Tested: AdaBoost, XGBoost
Why Abandoned: The were outperformed by GradientBoostingClassifier on various parameters in our analysis.
Insight: It is easy to get the performance up to 74% and difficult to get the performance alot higher. It is similarly hard to perform at more than 80% on the training set with the methods we attempted.
Our Assumptions: We restricted ourselves to usin the data given. We restricted ourselves to ML approaches that we can train in a short amount of time, e.g. Deep Learning is too GPU-computation heavy.
-
Open a python 3 environment with the following libraries installed: tqdm, sklearn, numpy, scipy, nltk
-
Run the python 3 file "Protego_backend_alpha.py" scans through the training data and evaluates on the test data.
-
This gives a model, which can classify the relationship between the header and the body of texts.