Skip to content

narVidhai/tamil-nlp-catalog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tamil Deep Learning Awesome List

A curated catalog of open-source resources for Tamil NLP & AI.

The estimated worldwide Tamiḻ-speaking population is around 80-85 million, which is near to the population of Germany. Hence it is crucial to work on natural language processing for தமிழ் (Tamiḻ) and develop tools inorder to ensure the language is digitally well-represented.

This list will serve as a catalog for all resources related to Tamil NLP.

Note:

  • Please use GitHub Issues for queries/feedback or to contribute resources/links.
  • If you find this useful, please star this on GitHub to encourage this list to be active.
    • If you want to follow all latest updates in this catalog, press "watch" button on top-right of this repo.
  • Share this awesome website if you liked it! :-)

Tools, Libraries, Models

General

Also check Ezhil Foundation's Awesome-Tamil for lot more resources!

Word Embeddings

Transformers, BERT

Translation

Online translation libraries

Transliteration

OCR

Speech

Grammar

Miscellaneous


Datasets

Monolingual Corpus

Government Raw Text

Translation

Note: You can also use the MTData library to automatically download parallel data from many of the above sources.

Government parallel data

Papers

Transliteration

Speech, Audio

Speech-To-Text

Speech Translation

Text-to-Speech (TTS)

Audio

Named Entity Recognition

Text Classification

OCR

Character-level datasets

Scene-Text Detection / Recognition

Document OCR

Part-Of-Speech (POS) Tagging

Sentiment and Abuse Analysis

Lexical Resources

Natural Language Generation

Benchmarks

Miscellaneous NLP Datasets


Other Important Resources