[pipeline] fix padding for 1-d tensors #31776

sanchit-gandhi · 2024-07-03T14:52:28Z

What does this PR do?

Currently on main, batched inference using the ASR pipeline fails:

from transformers import pipeline, AutoProcessor, WhisperForConditionalGeneration
from transformers.utils import is_accelerate_available
from datasets import load_dataset

processor = AutoProcessor.from_pretrained("openai/whisper-tiny.en")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny.en", low_cpu_mem_usage=is_accelerate_available())

pipe = pipeline("automatic-speech-recognition", model=model, feature_extractor=processor.feature_extractor, tokenizer=processor.tokenizer)

dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = dataset[:2]["audio"]

pipe(sample, batch_size=2)

Traceback

  File "/home/sanchit/transformers/src/transformers/pipelines/base.py", line 194, in inner                              
    padded[key] = _pad(items, key, _padding_value, padding_side)                                                        
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                        
  File "/home/sanchit/transformers/src/transformers/pipelines/base.py", line 100, in _pad                               
    max_length = max(item[key].shape[1] for item in items)                                                              
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                              
  File "/home/sanchit/transformers/src/transformers/pipelines/base.py", line 100, in <genexpr>                          
    max_length = max(item[key].shape[1] for item in items)                                                              
                     ~~~~~~~~~~~~~~~^^^                                                                                 
IndexError: tuple index out of range

This is because the pipeline class attempts to pad the 1-d tensor of num_frames, which we added in #30637 to correctly compute word-level timestamps:

transformers/src/transformers/pipelines/automatic_speech_recognition.py

Line 452 in 048f599

extra["num_frames"] = processed.pop("num_frames")

The simple fix is to handle padding of 1-d tensors explicitly in the private _pad method, which we implement here. We also add a slow test to confirm batched generation works following the fix.

HuggingFaceDocBuilderDev · 2024-07-03T15:12:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sanchit-gandhi added 3 commits July 3, 2024 15:41

[pipeline] fix padding for 1-d tensors

4898d73

add test

e9f2a0b

make style

29afb69

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pipeline] fix padding for 1-d tensors #31776

[pipeline] fix padding for 1-d tensors #31776

sanchit-gandhi commented Jul 3, 2024

HuggingFaceDocBuilderDev commented Jul 3, 2024

[pipeline] fix padding for 1-d tensors #31776

Are you sure you want to change the base?

[pipeline] fix padding for 1-d tensors #31776

Conversation

sanchit-gandhi commented Jul 3, 2024

What does this PR do?

HuggingFaceDocBuilderDev commented Jul 3, 2024