Skip to content

Automatic Transliterations of Baybayin Texts using Block-Level OCR

Latest
Compare
Choose a tag to compare
@rbp0803 rbp0803 released this 15 Apr 13:29
· 22 commits to main since this release
2312953

In this work, we introduced an algorithm that recognizes Baybayin scripts from Latin ones at a block level in a text image. Further, the identified Baybayin writings are transliterated into their Latin equivalent/s. This work relies on the method proposed in (1), where some MATLAB codes/functions in (2) have been modified to achieve the purpose of the project. That is, the classification of the Baybayin or Latin word applies at the character level where we add each character's recognition result (+1 for Latin and -1 for Baybayin) to determine to which script the given word belongs. The sign of the sum concludes the script identification. Moreover, the Baybayin transliterations are carried out using the method presented in (3). Thus, MATLAB codes/functions in (4) have been utilized to generate the corresponding Latin formation of each Baybayin word detected.

We tested the system with the dataset provided in (5) and obtained an impressive script recognition accuracy. We also discuss the strengths and limitations of the system and recommendations for further research.

The complete system files can be downloaded below with the filename Automatic.Transliterations.of.Baybayin.Texts.using.Block-Level.OCR.zip.

Links cited:
(1) https://peerj.com/articles/cs-360/
(2) https://github.com/rbp0803/An-OCR-System-for-Baybayin-Scripts-using-SVM
(3) https://peerj.com/articles/cs-596/
(4) https://github.com/rbp0803/A-Baybayin-Word-Recognition-System
(5) https://www.kaggle.com/rodneypino/baybayin-and-latin-text-images