Udacity Computer Vision ND Image Captioning Project

Image Captioning is the process of generating textual description of an image. In this project, I have implemented a Deep Learning Model inspired by this paper and this paper using COCO dataset by Microsoft and trained the network for nearly 10 hrs using GPU.

The architecture consists of:

CNN based on the ResNet architecture encoder, which encodes the images into the embedded feature vectors

2. RNN decoder consisting of LSTM units, which translates the feature vector into a sequence of tokens

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
images		images
0_Dataset.ipynb		0_Dataset.ipynb
1_Preliminaries.ipynb		1_Preliminaries.ipynb
2_Training.ipynb		2_Training.ipynb
3_Inference.ipynb		3_Inference.ipynb
README.md		README.md
data_loader.py		data_loader.py
data_loader_val.py		data_loader_val.py
model.py		model.py
training_log.txt		training_log.txt
vocab.pkl		vocab.pkl
vocabulary.py		vocabulary.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Udacity Computer Vision ND Image Captioning Project

Output results

About

Releases

Packages

Languages

assem-khaled/Image-Captioning

Folders and files

Latest commit

History

Repository files navigation

Udacity Computer Vision ND Image Captioning Project

Output results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages