Skip to content

Automatic video captioning, a final project for CS5422 Neural Networks and Deep Learning

Notifications You must be signed in to change notification settings

ardiankr/Video-Captioning-CS5422

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Video-Captioning-CS5422

Automatic video captioning, a final project for CS5422 Neural Networks and Deep Learning. This project uses neural network to produce a simple fixed 3-words caption (<noun> <verb> <noun>) for each sequence of video frames.

Neural network architecture

The neural network is composed of 4 main components:

  • Feature extractor using pretrained EfficientNet
  • Object classifier, a linear layer which sole purpose is to capture the presence of objects in each frame
  • Encoder, a linear layer which helps capture the action happening in each frame
  • Decoder, a RNN which produces the caption

image

About

Automatic video captioning, a final project for CS5422 Neural Networks and Deep Learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages