Handwritten OCR for Kanji Characters

Open-source research project developing a CNN OCR (optical character recognition) dataset and model that can identify handwritten Kanji and other Japanese characters.

Dataset

Dataset is located at Machine_Learning/data. Each file is a PNG of the character in its filename. The original SVG dataset that was processed contains 7,000 images of handwritten kanji characters.

OCR Model

Model architecture is located at Machine_Learning/architecture.py. To import the model and weights in PyTorch run the following inside this cloned repo.

import torch
from Machine_Learning.architecture import KanjiNet

model = KanjiNet()
model.load_state_dict(torch.load('Machine_Learning/weights.pth')

Next Steps and Contributing

If you have improvements for data processing, training, or architecture please feel free to submit a pull request, any changes are welcome!

Next steps:

Apply random perspective transformation to training images for upsampling and unconstrained recognition of distorted characters
Add segmentation and RNN to model for multi-character prediction

License

The Kanji-OCR project is open-source and is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Machine_Learning		Machine_Learning
.gitignore		.gitignore
IMG_4295.JPG		IMG_4295.JPG
MSMINCHO.TTF		MSMINCHO.TTF
README.md		README.md
vision.py		vision.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Handwritten OCR for Kanji Characters

Dataset

OCR Model

Next Steps and Contributing

License

About

Releases

Packages

Languages

Jdka1/Kanji-OCR

Folders and files

Latest commit

History

Repository files navigation

Handwritten OCR for Kanji Characters

Dataset

OCR Model

Next Steps and Contributing

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages