Skip to content

assem-khaled/Image-Captioning

Repository files navigation

Udacity Computer Vision ND Image Captioning Project

Image Captioning is the process of generating textual description of an image. In this project, I have implemented a Deep Learning Model inspired by this paper and this paper using COCO dataset by Microsoft and trained the network for nearly 10 hrs using GPU.

The architecture consists of:

  1. CNN based on the ResNet architecture encoder, which encodes the images into the embedded feature vectors

2. RNN decoder consisting of LSTM units, which translates the feature vector into a sequence of tokens

Output results

About

Udacity Computer Vision ND Image Captioning Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages