You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I've been using spaCy over the last few weeks to fine-tune a roberta-base model for NER. So far, the experience has been great and I'm able to train and use the fine-tuned models without any issues.
I now wanted to enable mixed precision to speed up the training process. However, when I do that, I get the following error:
File "/usr/local/lib/python3.10/dist-packages/thinc/shims/pytorch_grad_scaler.py", line 171, in update
torch._amp_update_scale_(
RuntimeError: current_scale must be a float tensor.
Toggling mixed_precision back to false results in successful training.
Traceback
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
ℹ Saving to output directory: spacy_trained_pipeline_en
ℹ Using GPU: 0
=========================== Initializing pipeline ===========================
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
✔ Initialized pipeline
============================= Training pipeline =============================
ℹ Pipeline: ['transformer', 'ner']
ℹ Initial learn rate: 0.0
E # LOSS TRANS... LOSS NER ENTS_F ENTS_P ENTS_R SCORE
--- ------ ------------- -------- ------ ------ ------ ------
⚠ Aborting and saving the final best model. Encountered exception:
RuntimeError('current_scale must be a float tensor.')
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/spacy/__main__.py", line 4, in <module>
setup_cli()
File "/usr/local/lib/python3.10/dist-packages/spacy/cli/_util.py", line 87, in setup_cli
command(prog_name=COMMAND)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 783, in main
return _main(
File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 225, in _main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/usr/local/lib/python3.10/dist-packages/spacy/cli/train.py", line 54, in train_cli
train(config_path, output_path, use_gpu=use_gpu, overrides=overrides)
File "/usr/local/lib/python3.10/dist-packages/spacy/cli/train.py", line 84, in train
train_nlp(nlp, output_path, use_gpu=use_gpu, stdout=sys.stdout, stderr=sys.stderr)
File "/usr/local/lib/python3.10/dist-packages/spacy/training/loop.py", line 135, in train
raise e
File "/usr/local/lib/python3.10/dist-packages/spacy/training/loop.py", line 118, in train
for batch, info, is_best_checkpoint in training_step_iterator:
File "/usr/local/lib/python3.10/dist-packages/spacy/training/loop.py", line 236, in train_while_improving
proc.finish_update(optimizer) # type: ignore[attr-defined]
File "spacy/pipeline/trainable_pipe.pyx", line 252, in spacy.pipeline.trainable_pipe.TrainablePipe.finish_update
File "/usr/local/lib/python3.10/dist-packages/thinc/model.py", line 342, in finish_update
shim.finish_update(optimizer)
File "/usr/local/lib/python3.10/dist-packages/thinc/shims/pytorch.py", line 180, in finish_update
self._grad_scaler.update()
File "/usr/local/lib/python3.10/dist-packages/thinc/shims/pytorch_grad_scaler.py", line 171, in update
torch._amp_update_scale_(
RuntimeError: current_scale must be a float tensor.
To me, this hints that the grad_scaler_config is somehow not getting to PyTorch, but I'm not sure what I'm doing wrong.
I'm following the example config from spacy-transformers.TransformerModel.v3.
Hi! I've been using spaCy over the last few weeks to fine-tune a
roberta-base
model for NER. So far, the experience has been great and I'm able to train and use the fine-tuned models without any issues.I now wanted to enable mixed precision to speed up the training process. However, when I do that, I get the following error:
Toggling
mixed_precision
back tofalse
results in successful training.Traceback
To me, this hints that the
grad_scaler_config
is somehow not getting to PyTorch, but I'm not sure what I'm doing wrong.I'm following the example config from spacy-transformers.TransformerModel.v3.
My config file, trf_config.cfg
How to reproduce the behaviour
I'm running the training on Google Colab, using a Tesla T4 runtime:
I've tried not executing the line
!export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
, but it doesn't make a difference.I've also made sure that I call
spacy train
with--gpu-id 0
.Here's the exact steps of the Colab notebook I use:
Colab notebook
!nvcc --version
!pip install spacy[cuda12x,transformers] transformers[sentencepiece]
!python -m spacy download en_core_web_trf
Could you please give me a hand? Thanks a lot!
Info about spaCy
The text was updated successfully, but these errors were encountered: