Skip to content

An implementation of the paper "Less is More", which was used to attempt the VizWiz visual question answering and answerability challenge tasks.

License

Notifications You must be signed in to change notification settings

MohEsmail143/vizwiz-visual-question-answering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VizWiz Visual Question Answering

This project is a replication of the paper Less Is More: Linear Layers on CLIP Features as Powerful VizWiz Model, which uses OpenAI's CLIP model (without fine-tuning) as a feature extractor, along with a linear layer, to attempt the VizWiz Visual Question Answering and Answerability Challenges.

The dataset can be found here: VizWiz, along with the Kaggle notebook used in the attempt.

This project was developed as part of the course Pattern Recognition in the Spring 2023 semester at the Faculty of Engineering, Alexandria University, under the Computer and Communications Engineering department, supervised by Dr. Marwan Torki.

Steps

1- Examine the VizWiz dataset.

Sample from VizWiz dataset

2- Build the model.

VizWiz model

3- Evaluate the accuracy of the model according to the metrics defined by the competition.

4- Try your own example

Try it yourself

Prerequisites

This project was developed in the following environment:

  • Jupyter Notebook
  • Miniconda
  • Python 3.11.5
  • PyTorch

Installing

1- Clone the repository to your local machine:

git clone https://github.com/MohEsmail143/vizwiz-visual-question-answering.git

2- Open Jupyter notebook.

3- Check out the the Jupyter notebook visual-question-answering.ipynb.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.