CLIP for image and text embeddings #14311

txhno · 2024-06-05T09:16:02Z

Link to the documentation pages (if available)

https://github.com/patrickjohncyh/fashion-clip
https://huggingface.co/patrickjohncyh/fashion-clip

How could the documentation be improved?

Its a finetune of CLIP trained on a 500K Fashion DS, I would like to use the Huggingface API to use it. Either that or their wrapper package.
If it can be done please let me know how, Thanks! :)

maziyarpanahi · 2024-06-05T09:19:10Z

Interesting, it seems to be CLIPModel architecture. @DevinTDHa can we do that currently, or should we put it on the roadmap?

DevinTDHa · 2024-06-06T07:53:29Z

Seems like it should work, if the underlying model doesn't have any architectural changes.

I'll try it out and report back!

DevinTDHa · 2024-06-06T08:42:06Z

The model works no problem in Spark NLP! Just follow this notebook to import the model properly:

https://github.com/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/onnx/HuggingFace_ONNX_in_Spark_NLP_CLIP.ipynb

If you change the model name to patrickjohncyh/fashion-clip it should work. Let me know if you have any other questions.

txhno · 2024-06-07T04:23:26Z

@DevinTDHa @maziyarpanahi Thanks a lot! I will definitely try it out. I'll contact you if something pops up. :)

txhno · 2024-06-11T05:09:14Z

@DevinTDHa @maziyarpanahi I have a different question. Is it possible for me to use Spark NLP to compute CLIP embeddings instead of just using the ZeroShotClassification? My usecase is, taking a folder of images, and using Spark NLP to compute the embeddings of all the images in the folder and store them in a vector store, to later do retrieval tasks or similarity search.

Could I do something like this?

CLIP = (
CLIPForZeroShotClassification.loadSavedModel(f"{EXPORT_PATH}", spark)
.setInputCols("image_assembler")
.setOutputCol("embedding")
)
image_assembler = ImageAssembler()
.setInputCol("image")
.setOutputCol("image_assembler")

And could I do it without setting labels? CLIP.setCandidateLabels

DevinTDHa · 2024-06-11T12:39:27Z

Hi @txhno,

Sadly this is currently not possible, we would have to create this as a new feature. I don't think it would take that much time as most of it is already implemented. @maziyarpanahi perhaps we could fit this into one of the next releases?

maziyarpanahi · 2024-06-11T13:01:15Z

Thanks @txhno and @DevinTDHa. In fact, the idea was always to continue with using CLIP to have an annotator to convert image to embeddings, and another one to covert text into embeddings.

We will add these into our roadmap and I change this into feature request ticket.

txhno added the documentation label Jun 5, 2024

txhno assigned DevinTDHa Jun 5, 2024

maziyarpanahi added Feature request and removed documentation labels Jun 11, 2024

maziyarpanahi changed the title ~~Can I use "fashion-clip" along with spark-nlp?~~ CLIP for image and text embeddings Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLIP for image and text embeddings #14311

CLIP for image and text embeddings #14311

txhno commented Jun 5, 2024 •

edited

Loading

maziyarpanahi commented Jun 5, 2024

DevinTDHa commented Jun 6, 2024

DevinTDHa commented Jun 6, 2024

txhno commented Jun 7, 2024

txhno commented Jun 11, 2024 •

edited

Loading

DevinTDHa commented Jun 11, 2024

maziyarpanahi commented Jun 11, 2024

CLIP for image and text embeddings #14311

CLIP for image and text embeddings #14311

Comments

txhno commented Jun 5, 2024 • edited Loading

Link to the documentation pages (if available)

How could the documentation be improved?

maziyarpanahi commented Jun 5, 2024

DevinTDHa commented Jun 6, 2024

DevinTDHa commented Jun 6, 2024

txhno commented Jun 7, 2024

txhno commented Jun 11, 2024 • edited Loading

DevinTDHa commented Jun 11, 2024

maziyarpanahi commented Jun 11, 2024

txhno commented Jun 5, 2024 •

edited

Loading

txhno commented Jun 11, 2024 •

edited

Loading