You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
I'm following the example script to fine-tune CLIP on some custom data for zero-shot image classification. I am starting from the popular openai/clip-vit-base-patch32 weights. I can get the script to run on my dataset and I can see my training loss decreasing, but...
During eval I want to add a custom accuracy metric, so I defined a compute_metrics(eval_preds) that I pass to the trainer. For this model, the eval_preds that gets passed to my custom metrics function should be a tuple where the first element is logits_per_image: an (image_batch_size, text_batch_size) tensor as described here.
Expected behavior
logits_per_image appears to have shape (num_validation_examples, batch_size).
Similarly, the 2nd element in eval_preds should be logits_per_text and have transposed dimensions but it's also (num_validation_examples, batch_size). So I really can't tell what's going on here - certainly not the behavior the docs describe.
Furthermore label_ids is an empty list. I'd expect something like arange(num_classes) or the indices for the current batch.
In short, I'd like to compute top1 and top5 accuracy but I can't tell what the extra rows in the logit tensors are there for.
The text was updated successfully, but these errors were encountered:
Hi @npyoung, thanks for opening an issue! Could you share a minimal code reproducer?
I find it surprising the logits_per_image don't have the transposed dimensions of logits_per_text from the model output as this is how they are defined in the model and there are no further transformations on the object.
System Info
transformers
version: 4.42.3Who can help?
@sgugger @muellerzr
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I'm following the example script to fine-tune CLIP on some custom data for zero-shot image classification. I am starting from the popular
openai/clip-vit-base-patch32
weights. I can get the script to run on my dataset and I can see my training loss decreasing, but...During eval I want to add a custom accuracy metric, so I defined a
compute_metrics(eval_preds)
that I pass to the trainer. For this model, theeval_preds
that gets passed to my custom metrics function should be a tuple where the first element islogits_per_image
: an(image_batch_size, text_batch_size)
tensor as described here.Expected behavior
logits_per_image
appears to have shape(num_validation_examples, batch_size)
.Similarly, the 2nd element in
eval_preds
should belogits_per_text
and have transposed dimensions but it's also(num_validation_examples, batch_size)
. So I really can't tell what's going on here - certainly not the behavior the docs describe.Furthermore
label_ids
is an empty list. I'd expect something likearange(num_classes)
or the indices for the current batch.In short, I'd like to compute top1 and top5 accuracy but I can't tell what the extra rows in the logit tensors are there for.
The text was updated successfully, but these errors were encountered: