Skip to content

Cuupa/classificator

Repository files navigation

classificator

Documentation work in progress!

GitHub CI GitHub issues GitHub pull request Releases

Content

About this project

This project is a simple classification engine written in Kotlin and using Spring Boot as a framework.

This project is provided via the MIT-licence, which is free of charge. But if you want to support me, you can spend me a beer or a coffee. If you want to participate, feel free to create pull requests, fork this project or hit me up with suggestions or code reviews.

THIS IS A WORK IN PROGRESS and done in my spare time.

How to contribute

If you want to participate, feel free to create pull requests, fork this project, create new issues or hit me up with suggestions. When creating an issue or a pull request, please be as detailed as possible.

"I want to participate, but I know nothing about programming 😔"

  • No problem. You can contribute by providing topic definitions or contribute by providing feedback, make some suggestions e.g. If you want to contribute to your topic, open a new issue providing your suggested changes but also supply any text you have tested with. If you want to create or fine tune a topic, create a pull request and I'll give it a shot.

If you think this project is awesome, you can spend me a beer or a coffee.

BuyMeACoffee

Direct link

How it works

Currently, it is just a keyword classification engine with some tweaks. It uses the Levenshtein algorithm to counter spelling or OCR errors.

It tries to match the topics, sender and metadata, provided in the 7zip archive in "knowledgebase/kb-{version number}.db". If no sender matches, it tries to determine the sender via REGEX, removing the ones with more than six words, counting the occurrences in the text and taking the occurrences times the length of the String.

There is a simple test GUI at http://addressofyour.server/

Currently, I'm working on lemmatizing and different languages.

Documentation

You can read the latest documentation here.