Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: Image Classifier #1

Closed
6 tasks done
nelsonic opened this issue May 25, 2023 · 9 comments · Fixed by #2
Closed
6 tasks done

Epic: Image Classifier #1

nelsonic opened this issue May 25, 2023 · 9 comments · Fixed by #2
Assignees
Labels
documentation Improvements or additions to documentation elixir Pull requests that update Elixir code enhancement New feature or enhancement of existing functionality good first issue Good for newcomers priority-2 Second highest priority, should be worked on as soon as the Priority-1 issues are finished research Research required; be specific T1d Time Estimate 1 Day technical A technical issue that requires understanding of the code, infrastructure or dependencies

Comments

@nelsonic
Copy link
Member

nelsonic commented May 25, 2023

Once we are uploading images dwyl/imgup#51
We want to classify the images and suggest meta tags to describe the images so that they become "searchable".
That means pulling any text out of images using OCR.
And attempting to find any detail in images that can be useful.

We aren't going build our own models from scratch. but we are going to ...

Todo

  • Research the available models, services/APIs we can use to send an image that classify images

  • Research available OCR services or models.

    • If there is an Open Source OCR model we can run on our own infra e.g. €20/month on Fly.io share!!
  • Images that are uploaded from a Camera or Smart Phone contain metadata including camera type/model, location (where the photo was taken), ISO, Shutter, Focal Length, Original Resolution, etc. We want to capture this and feed it into the classifier. Feat: Store Metadata and Image Classification/Info #3

  • The objective of the classifier is to attempt to describe the image and return a few keywords.

  • If it makes more sense to have this as a standalone app (separate from imgup) then feel free to create a new repo! Then just send the data to the standalone app and receive JSON data in response. 💭

@LuchoTurtle please leave comments with your research. 🙏

Context

We want to be able to upload images in our App and have them become an item of content.
i.e. I take a photo of a messy kitchen and it becomes "Tidy The Kitchen" with a small thumbnail of the image.
If I tap on the thumbnail I see the full-screen. But the Text is the important part.

The reason we want to have a "Visual Todo List" is that it becomes easy for people who don't yet read (think toddlers) or people who don't read well (people who only have basic literacy) to follow instructions.

@nelsonic nelsonic added the enhancement New feature or enhancement of existing functionality label May 25, 2023
@LuchoTurtle
Copy link
Member

Stumbled upon these two, which might be relevant to revisit at a later stage:
https://github.com/bentoml/OpenLLM
https://github.com/showlab/Image2Paragraph

@nelsonic
Copy link
Member Author

Yeah, saw OpenLLM on HN this morning:
openllm-top-hn
https://news.ycombinator.com/item?id=36388219
Looks good. BentoML is what OpenAI could have been but they chose to go closed (MSFT) ... 🙄

@LuchoTurtle
Copy link
Member

I've thought about what would be the best way of doing this and I've found a fair share of resources that I think may help get something close to what we want.

Image Captioning models

Most common open-source LLMs, such as Llama2 or Claude2, only receive text input. I took a gander at https://github.com/bentoml/OpenLLM, as I've stated in the comment above. However, it's not really useful to us as these LLms do not understand image inputs (though maybe some of these can understand vectorial representations of images). Therefore, we have to forgo these more "mainstream" LLMs for this use case.

There are, however, models pertaining to computer vision we can definitely use. I started my dive in https://github.com/salesforce/LAVIS#image-captioning, which led to me discovering BLIP-2, a zero-shot image-to-text generation model that we can use for image captioning.

I'm not going to explain how BLIP-2 works but you can find more info about it at https://huggingface.co/blog/blip-2. The good thing about it is that it's available in Hugging Face Transformers, which we can easily use to download and run BLIP-2 as a pre-trained model quite easily, even if it's just for testing purposes.

You can find a demo at https://huggingface.co/spaces/Salesforce/BLIP2.

Langchain 🦜

I had heard about Langchain several times for a few months, and how it makes it easy to create LLM-based applications, and chain different models together to yield a given output for a person for whatever use case. And the fact that you can easily deploy it to fly.io is a big plus.

I was thinking of using BLIP-2 and chaining it to an open-source LLM like Llama 2 or the others, to get a more descriptive caption of the image, so we could extract keywords afterwards.

Image2Paragraph

However, I realised that I was doing something similar to Image2Paragraph, which does something similar to this, but with the added capabilities of two models: GRIT and Segment Anything, which provide contextual descriptions of images. The output of all three models (BLIP-2, GRIT, `Segment Anything) are later fed to an LLM (GPT, in this case) to generate a text paragraph describing the image.

Here's how the pipeline works:

image

So what to use?

You should give Image2Paragraph a whirl (I already tried on Hugging Faces but it's not working https://huggingface.co/spaces/Awiny/Image2Paragraph) but I don't see a clear way of using it to receive an image URL and output the paragraph and deploy this on fly.io. If I can only have this on localhost, there's no point in pursuing this.

So I wonder if only using BLIP-2 or using vit-gpt2-image-captioning models from HuggingFace is easier and more "doable" for what we want.

(The latter seems like a highly plausible option using transformers. See https://ankur3107.github.io/blogs/the-illustrated-image-captioning-using-transformers/).

@nelsonic
Copy link
Member Author

Good research/summary. Thanks. 👌

@LuchoTurtle
Copy link
Member

As @nelsonic suggested, we can give https://github.com/elixir-image/image a whirl, as well.

@nelsonic
Copy link
Member Author

nelsonic commented Sep 19, 2023

@LuchoTurtle I've lowered the priority on this issue to reflect the fact that it's a very "nice to have" feature but isn't "core" to the experience of our App for the time being. We need to focus on the WYSIWYG editor and getting the "core" functionality done and then shipping the Flutter App to the App Store ASAP. ⏳

Ref: dwyl/product-roadmap#40 we need to work on the Flutter App as our exclusive focus until we have feature parity with the Elixir/Phoenix MVP. I want to be using the Flutter App on my phone ASAP. 🙏

@nelsonic
Copy link
Member Author

Having said that, when you take "breaks" from the Flutter work and want to do research for image classifying, please do it. I know that AI/ML is an area of interest/focus for you so definitely research and capture what you learn. 🔍 🧑‍💻 ✍️ ✅

@nelsonic
Copy link
Member Author

nelsonic commented Sep 19, 2023

It will be an awesome enhancement to add image recognition to the images people upload in the Flutter App.
But if we don't yet have a Flutter App deployed to the App Store dwyl/app#342 or Google Play dwyl/app#346 we are a "Default Dead" company.

@nelsonic
Copy link
Member Author

@LuchoTurtle given that we are BLOCKED on both iOS App Store dwyl/app#342 (comment) and Google Play dwyl/app#346 both assigned to @iteles 🔥
Please take a look at this issue today.
We should create a new repo for it: https://github.com/dwyl/image-classifier 🆕 ✅
Feel free to use Python for it if you think you can do it faster. 🐍
Otherwise if you can use Elixir, it will be easier for us to maintain longer-term. 💧

@nelsonic nelsonic transferred this issue from dwyl/imgup Oct 25, 2023
@nelsonic nelsonic added documentation Improvements or additions to documentation good first issue Good for newcomers elixir Pull requests that update Elixir code research Research required; be specific T1d Time Estimate 1 Day technical A technical issue that requires understanding of the code, infrastructure or dependencies priority-2 Second highest priority, should be worked on as soon as the Priority-1 issues are finished labels Oct 25, 2023
LuchoTurtle added a commit that referenced this issue Oct 30, 2023
LuchoTurtle added a commit that referenced this issue Oct 30, 2023
LuchoTurtle added a commit that referenced this issue Oct 30, 2023
LuchoTurtle added a commit that referenced this issue Oct 30, 2023
nelsonic added a commit that referenced this issue Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation elixir Pull requests that update Elixir code enhancement New feature or enhancement of existing functionality good first issue Good for newcomers priority-2 Second highest priority, should be worked on as soon as the Priority-1 issues are finished research Research required; be specific T1d Time Estimate 1 Day technical A technical issue that requires understanding of the code, infrastructure or dependencies
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants