Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLIP for image and text embeddings #14311

Open
txhno opened this issue Jun 5, 2024 · 7 comments
Open

CLIP for image and text embeddings #14311

txhno opened this issue Jun 5, 2024 · 7 comments
Assignees

Comments

@txhno
Copy link

txhno commented Jun 5, 2024

Link to the documentation pages (if available)

https://github.com/patrickjohncyh/fashion-clip
https://huggingface.co/patrickjohncyh/fashion-clip

How could the documentation be improved?

Its a finetune of CLIP trained on a 500K Fashion DS, I would like to use the Huggingface API to use it. Either that or their wrapper package.
If it can be done please let me know how, Thanks! :)

@maziyarpanahi
Copy link
Member

Interesting, it seems to be CLIPModel architecture. @DevinTDHa can we do that currently, or should we put it on the roadmap?

@DevinTDHa
Copy link
Member

Seems like it should work, if the underlying model doesn't have any architectural changes.

I'll try it out and report back!

@DevinTDHa
Copy link
Member

The model works no problem in Spark NLP! Just follow this notebook to import the model properly:

https://github.com/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/onnx/HuggingFace_ONNX_in_Spark_NLP_CLIP.ipynb

If you change the model name to patrickjohncyh/fashion-clip it should work. Let me know if you have any other questions.

@txhno
Copy link
Author

txhno commented Jun 7, 2024

@DevinTDHa @maziyarpanahi Thanks a lot! I will definitely try it out. I'll contact you if something pops up. :)

@txhno
Copy link
Author

txhno commented Jun 11, 2024

@DevinTDHa @maziyarpanahi I have a different question. Is it possible for me to use Spark NLP to compute CLIP embeddings instead of just using the ZeroShotClassification? My usecase is, taking a folder of images, and using Spark NLP to compute the embeddings of all the images in the folder and store them in a vector store, to later do retrieval tasks or similarity search.

Could I do something like this?

CLIP = (
CLIPForZeroShotClassification.loadSavedModel(f"{EXPORT_PATH}", spark)
.setInputCols("image_assembler")
.setOutputCol("embedding")
)
image_assembler = ImageAssembler()
.setInputCol("image")
.setOutputCol("image_assembler")

And could I do it without setting labels? CLIP.setCandidateLabels

@DevinTDHa
Copy link
Member

Hi @txhno,

Sadly this is currently not possible, we would have to create this as a new feature. I don't think it would take that much time as most of it is already implemented. @maziyarpanahi perhaps we could fit this into one of the next releases?

@maziyarpanahi
Copy link
Member

Thanks @txhno and @DevinTDHa. In fact, the idea was always to continue with using CLIP to have an annotator to convert image to embeddings, and another one to covert text into embeddings.

We will add these into our roadmap and I change this into feature request ticket.

@maziyarpanahi maziyarpanahi changed the title Can I use "fashion-clip" along with spark-nlp? CLIP for image and text embeddings Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants