Skip to content

elisa-aleman/EntropyBasedSVM

Repository files navigation

Entropy-Based-SVM

Entropy based binary SVM classifier library

I use entropy in positive and negative emotional classification via SVM in many projects. These are general methods that can be applied in many circumnstances.

  • Bag_of_concepts: Provided a dictionary of clustered words, it returns a Bag of Concepts vector for a corpus

  • Bag_of_words: Provided a list of words, it returns a Bag of Words vector for a corpus

  • Corpus_preprocessing: I use gensim to preprocess the corpus into a lemmatized and tokenized version of the texts.

  • Entropy: I use scikit and gensim to calculate the entropy of words in positive and negative documents, so that I can then compare both entropies of the word and know the words that are probabilistically evenly distributed in one category but not in the other, which aids in classification.

  • Posi-Nega-Neutra_Tagged-Sentence-Parsing: When creating my training data, I xml tagged the text with , and tags. This method helps me parse that to a python list.

  • SVM_Methods: The methods I use to analyze SVM training results from K-folds cross validation to weight analysis.

  • Model_methods: Not only to use SVM, but when I wish to use other machine learning methodologies.

  • Model_metrics: I use scikit-learn to write my own K-folds method that returns F1, Accuracy, Precision and Recall. Included methods usually only return Accuracy or F1.

  • Kaomoji: A library to detect kaomoji in text and convert them into numbered tags before applying segmentation by parsers. (in languages without spaces like Chinese)

  • ProjectPaths: Paths to folders inside the project, such as "data", "logs", etc. for organization.

  • UsefulMethods: A few methods I use constantly

  • Best_SVM_selection: I use this constantly to compare different SVM parameters and choose the best, so I made it more accessible to import

Releases

No releases published

Packages

No packages published

Languages