Skip to content

A project re-evaluating 2D CNNs and comparing them on the basis of the number of Floating Point Operations Per Second with deep 3D CNNs for video classification.

Notifications You must be signed in to change notification settings

saswethrajanarayanan/video-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

video-classification

A project re-evaluating 2D CNNs for activity recognition.

I thank PyImageSearch for this interesting tutorial on Video Classification using 2D CNNs which has been a major inspiration for this work. The tutorial can be found here: https://www.pyimagesearch.com/2019/07/15/video-classification-with-keras-and-deep-learning/

The custom dataset: This work makes use of 2D CNNs and hypothesises that 2D CNNs can be trained on images representing different facets of human activities. The reason 2D CNNs are not used for video classification tasks is that they are not capable of learning or extracting spatio-temporal features owing to the absence of 3D computational kernels. This work however counters this fact using rolling prediction averaging. The average of confidence scores of a series of frames is taken and the output label having the highest confidence score is displayed as output. The training of the models is therefore done on images that respresent different facets of human activities.

The models that were tested are VGG-16, ResNet 50 and MobileNetv2, all of which are ImageNet pretrained models.

The models have ImageNet weights. Transfer Learning was used to train these models on the custom dataset. Rolling prediction averaging is done while testing the models on real-time videos. The number of frames that need to be averaged upon is upto the user. Higher the number of frames taken into the buffer for prediction averaging, closer is the model's performance to deep 3D CNNs trained on video datasets for activity recognition. There are certainly areas where 3D CNNs are just the right choice for video classification.

The custom dataset has not been added here. However, usesrs can download different images pertaining to an activity like cricket, online via a simple Google search. Images of different activities need to be put inside separate folders named after the activity. The number of neurons in the softmax layer will have to updated to be the same as the number of activities that are part of your dataset.

About

A project re-evaluating 2D CNNs and comparing them on the basis of the number of Floating Point Operations Per Second with deep 3D CNNs for video classification.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages