Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot Load .pt model #31829

Open
4 tasks
ivanhe123 opened this issue Jul 7, 2024 · 1 comment
Open
4 tasks

Cannot Load .pt model #31829

ivanhe123 opened this issue Jul 7, 2024 · 1 comment

Comments

@ivanhe123
Copy link

System Info

Python version 3.11

  • transformers version: 4.42.3
  • Platform: Windows-10-10.0.22631-SP0
  • Python version: 3.11.0
  • Huggingface_hub version: 0.23.4
  • Safetensors version: 0.4.2
  • Accelerate version: not installed
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.3.1+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?:
  • Using GPU in script?:
  • GPU type: NVIDIA GeForce RTX 4060 Laptop GPU

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. Finetuned model using https://www.kaggle.com/code/chlorinecl/notebook4101d69eb6
  2. Download .pt model and load it using
import torch
from transformers import AutoProcessor, SeamlessM4TModel
new_model = torch.load("./expt4_m4tM.pt")
processor = AutoProcessor.from_pretrained("seamless-m4t-medium")
model_seam = SeamlessM4TModel.from_pretrained("seamless-m4t-medium")
model_seam.load_state_dict(new_model)
model_seam.save_pretrained("./new_seamless-m4t-medium")
  1. Outputs:
D:\projects\GNNNER\venv\Lib\site-packages\transformers\deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
  File "D:\projects\GNNNER\convert_bin_to_pt.py", line 6, in <module>
    model_seam.load_state_dict(new_model)
  File "D:\projects\GNNNER\venv\Lib\site-packages\torch\nn\modules\module.py", line 2189, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for SeamlessM4TModel:
	Missing key(s) in state_dict: "shared.weight", "text_encoder.embed_tokens.weight", "text_encoder.layers.0.self_attn.k_proj.weight", "text_encoder.layers.0.self_attn.k_proj.bias", "text_encoder.layers.0.self_attn.v_proj.weight", "text_encoder.layers.0.self_attn.v_proj.bias", "text_encoder.layers.0.self_attn.q_proj.weight", "text_encoder.layers.0.self_attn.q_proj.bias", "text_encoder.layers.0.self_attn.out_proj.weight", "text_encoder.layers.0.self_attn.out_proj.bias", "text_encoder.layers.0.self_attn_layer_norm.weight", "text_encoder.layers.0.self_attn_layer_norm.bias", "text_encoder.layers.0.ffn.fc1.weight", "text_encoder.layers.0.ffn.fc1.bias", "text_encoder.layers.0.ffn.fc2.weight", "text_encoder.layers.0.ffn.fc2.bias", "text_encoder.layers.0.ffn_layer_norm.weight", "text_encoder.layers.0.ffn_layer_norm.bias", "text_encoder.layers.1.self_attn.k_proj.weight", "text_encoder.layers.1.self_attn.k_proj.bias", "text_encoder.layers.1.self_attn.v_proj.weight", "text_encoder.layers.1.self_attn.v_proj.bias", "text_encoder.layers.1.self_attn.q_proj.weight", "text_encoder.layers.1.self_attn.q_proj.bias", "text_encoder.layers.1.self_attn.out_proj.weight", "text_encoder.layers.1.self_attn.out_proj.bias", "text_encoder.layers.1.self_attn_layer_norm.weight", "text_encoder.layers.1.self_attn_layer_norm.bias", "text_encoder.layers.1.ffn.fc1.weight", "text_encoder.layers.1.ffn.fc1.bias", "text_encoder.layers.1.ffn.fc2.weight", "text_encoder.layers.1.ffn.fc2.bias", "text_encoder.layers.1.ffn_layer_norm.weight", "text_encoder.layers.1.ffn_layer_norm.bias", "text_encoder.layers.2.self_attn.k_proj.weight", "text_encoder.layers.2.self_attn.k_proj.bias", "text_encoder.layers.2.self_attn.v_proj.weight", "text_encoder.layers.2.self_attn.v_proj.bias", "text_encoder.layers.2.self_attn.q_proj.weight", "text_encoder.layers.2.self_attn.q_proj.bias", "text_encoder.layers.2.self_attn.out_proj.weight", "text_encoder.layers.2.self_attn.out_proj.bias", "text_encoder.layers.2.self_attn_layer_norm.weight", "text_encoder.layers.2.self_attn_layer_norm.bias", "text_encoder.layers.2.ffn.fc1.weight", "text_encoder.layers.2.ffn.fc1.bias", "text_encoder.layers.2.ffn.fc2.weight", "text_encoder.layers.2.ffn.fc2.bias", "text_encoder.layers.2.ffn_layer_norm.weight", "text_encoder.layers.2.ffn_layer_norm.bias", "text_encoder.layers.3.self_attn.k_proj.weight", "text_encoder.layers.3.self_attn.k_proj.bias", "text_encoder.layers.3.self_attn.v_proj.weight", "text_encoder.layers.3.self_attn.v_proj.bias", "text_encoder.layers.3.self_attn.q_proj.weight", "text_encoder.layers.3.self_attn.q_proj.bias", "text_encoder.layers.3.self_attn.out_proj.weight", "text_encoder.layers.3.self_attn.out_proj.bias", "text_encoder.layers.3.self_attn_layer_norm.weight", "text_encoder.layers.3.self_attn_layer_norm.bias", "text_encoder.layers.3.ffn.fc1.weight", "text_encoder.layers.3.ffn.fc1.bias", "text_encoder.layers.3.ffn.fc2.weight", "text_encoder.layers.3.ffn.fc2.bias", "text_encoder.layers.3.ffn_layer_norm.weight", "text_encoder.layers.3.ffn_layer_norm.bias", "text_encoder.layers.4.self_attn.k_proj.weight", "text_encoder.layers.4.self_attn.k_proj.bias", "text_encoder.layers.4.self_attn.v_proj.weight", "text_encoder.layers.4.self_attn.v_proj.bias", "text_encoder.layers.4.self_attn.q_proj.weight", "text_encoder.layers.4.self_attn.q_proj.bias", "text_encoder.layers.4.self_attn.out_proj.weight", "text_encoder.layers.4.self_attn.out_proj.bias", "text_encoder.layers.4.self_attn_layer_norm.weight", "text_encoder.layers.4.self_attn_layer_norm.bias", "text_encoder.layers.4.ffn.fc1.weight", "text_encoder.layers.4.ffn.fc1.bias", "text_encoder.layers.4.ffn.fc2.weight", "text_encoder.layers.4.ffn.fc2.bias", "text_encoder.layers.4.ffn_layer_norm.weight", "text_encoder.layers.4.ffn_layer_norm.bias", "text_encoder.layers.5.self_attn.k_proj.weight", "text_encoder.layers.5.self_attn.k_proj.bias", "text_encoder.layers.5.self_attn.v_proj.weight", "text_encoder.layers.5.self_attn.v_proj.bias", "text_encoder.layers.5.self_attn.q_proj.weight", "text_encoder.layers.5.self_attn.q_proj.bias", "text_encoder.layers.5.self_attn.out_proj.weight", "text_encoder.layers.5.self_attn.out_proj.bias", "text_encoder.layers.5.self_attn_layer_norm.weight", "text_encoder.layers.5.self_attn_layer_norm.bias", "text_encoder.layers.5.ffn.fc1.weight", "text_encoder.layers.5.ffn.fc1.bias", "text_encoder.layers.5.ffn.fc2.weight", "text_encoder.layers.5.ffn.fc2.bias", "text_encoder.layers.5.ffn_layer_norm.weight", "text_encoder.layers.5.ffn_layer_norm.bias", "text_encoder.layers.6.self_attn.k_proj.weight", "text_encoder.layers.6.self_attn.k_proj.bias", "text_encoder.layers.6.self_attn.v_proj.weight", "text_encoder.layers.6.self_attn.v_proj.bias", "text_encoder.layers.6.self_attn.q_proj.weight", "text_encoder.layers.6.self_attn.q_proj.bias", "text_encoder.layers.6.self_attn.out_proj.weight", "text_encoder.layers.6.self_attn.out_proj.bias", "text_encoder.layers.6.self_attn_layer_norm.weight", "text_encoder.layers.6.self_attn_layer_norm.bias", "text_encoder.layers.6.ffn.fc1.weight", "text_encoder.layers.6.ffn.fc1.bias", "text_encoder.layers.6.ffn.fc2.weight", "text_encoder.layers.6.ffn.fc2.bias", "text_encoder.layers.6.ffn_layer_norm.weight", "text_encoder.layers.6.ffn_layer_norm.bias", "text_encoder.layers.7.self_attn.k_proj.weight", "text_encoder.layers.7.self_attn.k_proj.bias", "text_encoder.layers.7.self_attn.v_proj.weight", "text_encoder.layers.7.self_attn.v_proj.bias", "text_encoder.layers.7.self_attn.q_proj.weight", "text_encoder.layers.7.self_attn.q_proj.bias", "text_encoder.layers.7.self_attn.out_proj.weight", "text_encoder.layers.7.self_attn.out_proj.bias", "text_encoder.layers.7.self_attn_layer_norm.weight", "text_encoder.layers.7.self_attn_layer_norm.bias", "text_encoder.layers.7.ffn.fc1.weight", "text_encoder.layers.7.ffn.fc1.bias", "text_encoder.layers.7.ffn.fc2.weight", "text_encoder.layers.7.ffn.fc2.bias", "text_encoder.layers.7.ffn_layer_norm.weight", "text_encoder.layers.7.ffn_layer_norm.bias", "text_encoder.layers.8.self_attn.k_proj.weight", "text_encoder.layers.8.self_attn.k_proj.bias", "text_encoder.layers.8.self_attn.v_proj.weight", "text_encoder.layers.8.self_attn.v_proj.bias", "text_encoder.layers.8.self_attn.q_proj.weight", "text_encoder.layers.8.self_attn.q_proj.bias", "text_encoder.layers.8.self_attn.out_proj.weight", "text_encoder.layers.8.self_attn.out_proj.bias", "text_encoder.layers.8.self_attn_layer_norm.weight", "text_encoder.layers.8.self_attn_layer_norm.bias", "text_encoder.layers.8.ffn.fc1.weight", "text_encoder.layers.8.ffn.fc1.bias", "text_encoder.layers.8.ffn.fc2.weight", "text_encoder.layers.8.ffn.fc2.bias", "text_encoder.layers.8.ffn_layer_norm.weight", "text_encoder.layers.8.ffn_layer_norm.bias", "text_encoder.layers.9.self_attn.k_proj.weight", "text_encoder.layers.9.self_attn.k_proj.bias", "text_encoder.layers.9.self_attn.v_proj.weight", "text_encoder.layers.9.self_attn.v_proj.bias", "text_encoder.layers.9.self_attn.q_proj.weight", "text_encoder.layers.9.self_attn.q_proj.bias", "text_encoder.layers.9.self_attn.out_proj.weight", "text_encoder.layers.9.self_attn.out_proj.bias", "text_encoder.layers.9.self_attn_layer_norm.weight", "text_encoder.layers.9.self_attn_layer_norm.bias", "text_encoder.layers.9.ffn.fc1.weight", "text_encoder.layers.9.ffn.fc1.bias", "text_encoder.layers.9.ffn.fc2.weight", "text_encoder.layers.9.ffn.fc2.bias", "text_encoder.layers.9.ffn_layer_norm.weight", "text_encoder.layers.9.ffn_layer_norm.bias", "text_encoder.layers.10.self_attn.k_proj.weight", "text_encoder.layers.10.self_attn.k_proj.bias", "text_encoder.layers.10.self_attn.v_proj.weight", "text_encoder.layers.10.self_attn.v_proj.bias", "text_encoder.layers.10.self_attn.q_proj.weight", "text_encoder.layers.10.self_attn.q_proj.bias", "text_encoder.layers.10.self_attn.out_proj.weight", "text_encoder.layers.10.self_attn.out_proj.bias", "text_encoder.layers.10.self_attn_layer_norm.weight", "text_encoder.layers.10.self_attn_layer_norm.bias", "text_encoder.layers.10.ffn.fc1.weight", "text_encoder.layers.10.ffn.fc1.bias", "text_encoder.layers.10.ffn.fc2.weight", "text_encoder.layers.10.ffn.fc2.bias", "text_encoder.layers.10.ffn_layer_norm.weight", "text_encoder.layers.10.ffn_layer_norm.bias", "text_encoder.layers.11.self_attn.k_proj.weight", "text_encoder.layers.11.self_attn.k_proj.bias", "text_encoder.layers.11.self_attn.v_proj.weight", "text_encoder.layers.11.self_attn.v_proj.bias", "text_encoder.layers.11.self_attn.q_proj.weight", "text_encoder.layers.11.self_attn.q_proj.bias", "text_encoder.layers.11.self_attn.out_proj.weight", "text_encoder.layers.11.self_attn.out_proj.bias", "text_encoder.layers.11.self_attn_layer_norm.weight", "text_encoder.layers.11.self_attn_layer_norm.bias", "text_encoder.layers.11.ffn.fc1.weight", "text_encoder.layers.11.ffn.fc1.bias", "text_encoder.layers.11.ffn.fc2.weight", "text_encoder.layers.11.ffn.fc2.bias", "text_encoder.layers.11.ffn_layer_norm.weight", "text_encoder.layers.11.ffn_layer_norm.bias", "text_encoder.layer_norm.weight", "text_encoder.layer_norm.bias", "speech_encoder.feature_projection.layer_norm.weight", "speech_encoder.feature_projection.layer_norm.bias", "speech_encoder.feature_projection.projection.weight", "speech_encoder.feature_projection.projection.bias", "speech_encoder.encoder.layers.0.ffn1_layer_norm.weight", "speech_encoder.encoder.layers.0.ffn1_layer_norm.bias", "speech_encoder.encoder.layers.0.ffn1.intermediate_dense.weight", "speech_encoder.encoder.layers.0.ffn1.intermediate_dense.bias", "speech_encoder.encoder.layers.0.ffn1.output_dense.weight", "speech_encoder.encoder.layers.0.ffn1.output_dense.bias", "speech_encoder.encoder.layers.0.self_attn_layer_norm.weight", "speech_encoder.encoder.layers.0.self_attn_layer_norm.bias", "speech_encoder.encoder.layers.0.self_attn.pos_bias_u", "speech_encoder.encoder.layers.0.self_attn.pos_bias_v", "speech_encoder.encoder.layers.0.self_attn.linear_q.weight", "speech_encoder.encoder.layers.0.self_attn.linear_q.bias", "speech_encoder.encoder.layers.0.self_attn.linear_k.weight", "speech_encoder.encoder.layers.0.self_attn.linear_k.bias", "speech_encoder.encoder.layers.0.self_attn.linear_v.weight", "speech_encoder.encoder.layers.0.self_attn.linear_v.bias", "speech_encoder.encoder.layers.0.self_attn.linear_out.weight", "speech_encoder.encoder.layers.0.self_attn.linear_out.bias", "speech_encoder.encoder.layers.0.self_attn.linear_pos.weight", "speech_encoder.encoder.layers.0.conv_module.layer_norm.weight", "speech_encoder.encoder.layers.0.conv_module.layer_norm.bias", "speech_encoder.encoder.layers.0.conv_module.pointwise_conv1.weight", "speech_encoder.encoder.layers.0.conv_module.depthwise_conv.weight", "speech_encoder.encoder.layers.0.conv_module.batch_norm.weight", "speech_encoder.encoder.layers.0.conv_module.batch_norm.bias", "speech_encoder.encoder.layers.0.conv_module.batch_norm.running_mean", "speech_encoder.encoder.layers.0.conv_module.batch_norm.running_var", "speech_encoder.encoder.layers.0.conv_module.pointwise_conv2.weight", "speech_encoder.encoder.layers.0.ffn2_layer_norm.weight", "speech_encoder.encoder.layers.0.ffn2_layer_norm.bias", "speech_encoder.encoder.layers.0.ffn2.intermediate_dense.weight", "speech_encoder.encoder.layers.0.ffn2.intermediate_dense.bias", "speech_encoder.encoder.layers.0.ffn2.output_dense.weight", "speech_encoder.encoder.layers.0.ffn2.output_dense.bias", "speech_encoder.encoder.layers.0.final_layer_norm.weight", "speech_encoder.encoder.layers.0.final_layer_norm.bias", "speech_encoder.encoder.layers.1.ffn1_layer_norm.weight", "speech_encoder.encoder.layers.1.ffn1_layer_norm.bias", "speech_encoder.encoder.layers.1.ffn1.intermediate_dense.weight", "speech_encoder.encoder.layers.1.ffn1.intermediate_dense.bias", "speech_encoder.encoder.layers.1.ffn1.output_dense.weight", "speech_encoder.encoder.layers.1.ffn1.output_dense.bias", "speech_encoder.encoder.layers.1.self_attn_layer_norm.weight", "speech_encoder.encoder.layers.1.self_attn_layer_norm.bias", "speech_encoder.encoder.layers.1.self_attn.pos_bias_u", "speech_encoder.encoder.layers.1.self_attn.pos_bias_v", "speech_encoder.encoder.layers.1.self_attn.linear_q.weight", "speech_encoder.encoder.layers.1.self_attn.linear_q.bias", "speech_encoder.encoder.layers.1.self_attn.linear_k.weight", "speech_encoder.encoder.layers.1.self_attn.linear_k.bias", "speech_encoder.encoder.layers.1.self_attn.linear_v.weight", "speech_encoder.encoder.layers.1.self_attn.linear_v.bias", "speech_encoder.encoder.layers.1.self_attn.linear_out.weight", "speech_encoder.encoder.layers.1.self_attn.linear_out.bias", "speech_encoder.encoder.layers.1.self_attn.linear_pos.weight", "speech_encoder.encoder.layers.1.conv_module.layer_norm.weight", "speech_encoder.encoder.layers.1.conv_module.layer_norm.bias", "speech_encoder.encoder.layers.1.conv_module.pointwise_conv1.weight", "speech_encoder.encoder.layers.1.conv_module.depthwise_conv.weight", "speech_encoder.encoder.layers.1.conv_module.batch_norm.weight", "speech_encoder.encoder.layers.1.conv_module.batch_norm.bias", "speech_encoder.encoder.layers.1.conv_module.batch_norm.running_mean", "speech_encoder.encoder.layers.1.conv_module.batch_norm.running_var", "speech_encoder.encoder.layers.1.conv_module.pointwise_conv2.weight", "speech_encoder.encoder.layers.1.ffn2_layer_norm.weight", "speech_encoder.encoder.layers.1.ffn2_layer_norm.bias", "speech_encoder.encoder.layers.1.ffn2.intermediate_dense.weight", "speech_encoder.encoder.layers.1.ffn2.intermediate_dense.bias", "speech_encoder.encoder.layers.1.ffn2.output_dense.weight", "speech_encoder.encoder.layers.1.ffn2.output_dense.bias", "speech_encoder.encoder.layers.1.final_layer_norm.weight", "speech_encoder.encoder.layers.1.final_layer_norm.bias", "speech_encoder.encoder.layers.2.ffn1_layer_norm.weight", "speech_encoder.encoder.layers.2.ffn1_layer_norm.bias", "speech_encoder.encoder.layers.2.ffn1.intermediate_dense.weight", "speech_encoder.encoder.layers.2.ffn1.intermediate_dense.bias", "speech_encoder.encoder.layers.2.ffn1.output_dense.weight", "speech_encoder.encoder.layers.2.ffn1.output_dense.bias", "speech_encoder.encoder.layers.2.self_attn_layer_norm.weight", "speech_encoder.encoder.layers.2.self_attn_layer_norm.bias", "speech_encoder.encoder.layers.2.self_attn.pos_bias_u", "speech_encoder.encoder.layers.2.self_attn.pos_bias_v", "speech_encoder.encoder.layers.2.self_attn.linear_q.weight", "speech_encoder.encoder.layers.2.self_attn.linear_q.bias", "speech_encoder.encoder.layers.2.self_attn.linear_k.weight", "speech_encoder.encoder.layers.2.self_attn.linear_k.bias", "speech_encoder.encoder.layers.2.self_attn.linear_v.weight", "speech_encoder.encoder.layers.2.self_attn.linear_v.bias", "speech_encoder.encoder.layers.2.self_attn.linear_out.weight", "speech_encoder.encoder.layers.2.self_attn.linear_out.bias", "speech_encoder.encoder.layers.2.self_attn.linear_pos.weight", "speech_encoder.encoder.layers.2.conv_module.layer_norm.weight", "speech_encoder.encoder.layers.2.conv_module.layer_norm.bias", "speech_encoder.encoder.layers.2.conv_module.pointwise_conv1.weight", "speech_encoder.encoder.layers.2.conv_module.depthwise_conv.weight", "speech_encoder.encoder.layers.2.conv_module.batch_norm.weight", "speech_encoder.encoder.layers.2.conv_module.batch_norm.bias", "speech_encoder.encoder.layers.2.conv_module.batch_norm.running_mean", "speech_encoder.encoder.layers.2.conv_module.batch_norm.running_var", "speech_encoder.encoder.layers.2.conv_module.pointwise_conv2.weight", "speech_encoder.encoder.layers.2.ffn2_layer_norm.weight", "speech_encoder.encoder.layers.2.ffn2_layer_norm.bias", "speech_encoder.encoder.layers.2.ffn2.intermediate_dense.weight", "speech_encoder.encoder.layers.2.ffn2.intermediate_dense.bias", "speech_encoder.encoder.layers.2.ffn2.output_dense.weight", "speech_encoder.encoder.layers.2.ffn2.output_dense.bias", "speech_encoder.encoder.layers.2.final_layer_norm.weight", "speech_encoder.encoder.layers.2.final_layer_norm.bias", "speech_encoder.encoder.layers.3.ffn1_layer_norm.weight", "speech_encoder.encoder.layers.3.ffn1_layer_norm.bias", "speech_encoder.encoder.layers.3.ffn1.intermediate_dense.weight", "speech_encoder.encoder.layers.3.ffn1.intermediate_dense.bias", "speech_encoder.encoder.layers.3.ffn1.output_dense.weight", "speech_encoder.encoder.layers.3.ffn1.output_dense.bias", "speech_encoder.encoder.layers.3.self_attn_layer_norm.weight", "speech_encoder.encoder.layers.3.self_attn_layer_norm.bias", "speech_encoder.encoder.layers.3.self_attn.pos_bias_u", "speech_encoder.encoder.layers.3.self_attn.pos_bias_v", "speech_encoder.encoder.layers.3.self_attn.linear_q.weight", "speech_encoder.encoder.layers.3.self_attn.linear_q.bias", "speech_encoder.encoder.layers.3.self_attn.linear_k.weight", "speech_encoder.encoder.layers.3.self_attn.linear_k.bias", "speech_encoder.encoder.layers.3.self_attn.linear_v.weight", "speech_encoder.encoder.layers.3.self_attn.linear_v.bias", "speech_encoder.encoder.layers.3.self_attn.linear_out.weight", "speech_encoder.encoder.layers.3.self_attn.linear_out.bias", "speech_encoder.encoder.layers.3.self_attn.linear_pos.weight", "speech_encoder.encoder.layers.3.conv_module.layer_norm.weight", "speech_encoder.encoder.layers.3.conv_module.layer_norm.bias", "speech_encoder.encoder.layers.3.conv_module.pointwise_conv1.weight", "speech_encoder.encoder.layers.3.conv_module.depthwise_conv.weight", "speech_encoder.encoder.layers.3.conv_module.batch_norm.weight", "speech_encoder.encoder.layers.3.conv_module.batch_norm.bias", "speech_encoder.encoder.layers.3.conv_module.batch_norm.running_mean", "speech_encoder.encoder.layers.3.conv_module.batch_norm.running_var", "speech_encoder.encoder.layers.3.conv_module.pointwise_conv2.weight", "speech_encoder.encoder.layers.3.ffn2_layer_norm.weight", "speech_encoder.encoder.layers.3.ffn2_layer_norm.bias", "speech_encoder.encoder.layers.3.ffn2.intermediate_dense.weight", "speech_encoder.encoder.layers.3.ffn2.intermediate_dense.bias", "speech_encoder.encoder.layers.3.ffn2.output_dense.weight", "speech_encoder.encoder.layers.3.ffn2.output_dense.bias", "speech_encoder.encoder.layers.3.final_layer_norm.weight", "speech_encoder.encoder.layers.3.final_layer_norm.bias", "speech_encoder.encoder.layers.4.ffn1_layer_norm.weight", "speech_encoder.encoder.layers.4.ffn1_layer_norm.bias", "speech_encoder.encoder.layers.4.ffn1.intermediate_dense.weight", "speech_encoder.encoder.layers.4.ffn1.intermediate_dense.bias", "speech_encoder.encoder.layers.4.ffn1.output_dense.weight", "speech_encoder.encoder.layers.4.ffn1.output_dense.bias", "speech_encoder.encoder.layers.4.self_attn_layer_norm.weight", "speech_encoder.encoder.layers.4.self_attn_layer_norm.bias", "speech_encoder.encoder.layers.4.self_attn.pos_bias_u", "speech_encoder.encoder.layers.4.self_attn.pos_bias_v", "speech_encoder.encoder.layers.4.self_attn.linear_q.weight", "speech_encoder.encoder.layers.4.self_attn.linear_q.bias", "speech_encoder.encoder.layers.4.self_attn.linear_k.weight", "speech_encoder.encoder.layers.4.self_attn.linear_k.bias", "speech_encoder.encoder.layers.4.self_attn.linear_v.weight", "speech_encoder.encoder.layers.4.self_attn.linear_v.bias", "speech_encoder.encoder.layers.4.self_attn.linear_out.weight", "speech_encoder.encoder.layers.4.self_attn.linear_out.bias", "speech_encoder.encoder.layers.4.self_attn.linear_pos.weight", "speech_encoder.encoder.layers.4.conv_module.layer_norm.weight", "speech_encoder.encoder.layers.4.conv_module.layer_norm.bias", "speech_encoder.encoder.layers.4.conv_module.pointwise_conv1.weight", "speech_encoder.encoder.layers.4.conv_module.depthwise_conv.weight", "speech_encoder.encoder.layers.4.conv_module.batch_norm.weight", "speech_encoder.encoder.layers.4.conv_module.batch_norm.bias", "speech_encoder.encoder.layers.4.conv_module.batch_norm.running_mean", "speech_encoder.encoder.layers.4.conv_module.batch_norm.running_var", "speech_encoder.encoder.layers.4.conv_module.pointwise_conv2.weight", "speech_encoder.encoder.layers.4.ffn2_layer_norm.weight", "speech_encoder.encoder.layers.4.ffn2_layer_norm.bias", "speech_encoder.encoder.layers.4.ffn2.intermediate_dense.weight", "speech_encoder.encoder.layers.4.ffn2.intermediate_dense.bias", "speech_encoder.encoder.layers.4.ffn2.output_dense.weight", "speech_encoder.encoder.layers.4.ffn2.output_dense.bias", "speech_encoder.encoder.layers.4.final_layer_norm.weight", "speech_encoder.encoder.layers.4.final_layer_norm.bias", "speech_encoder.encoder.layers.5.ffn1_layer_norm.weight", "speech_encoder.encoder.layers.5.ffn1_layer_norm.bias", "speech_encoder.encoder.layers.5.ffn1.intermediate_dense.weight", "speech_encoder.encoder.layers.5.ffn1.intermediate_dense.bias", "speech_encoder.encoder.layers.5.ffn1.output_dense.weight", "speech_encoder.encoder.layers.5.ffn1.output_dense.bias", "speech_encoder.encoder.layers.5.self_attn_layer_norm.weight", "speech_encoder.encoder.layers.5.self_attn_layer_norm.bias", "speech_encoder.encoder.layers.5.self_attn.pos_bias_u", "speech_encoder.encoder.layers.5.self_attn.pos_bias_v", "speech_encoder.encoder.layers.5.self_attn.linear_q.weight", "speech_encoder.encoder.layers.5.self_attn.linear_q.bias", "speech_encoder.encoder.layers.5.self_attn.linear_k.weight", "speech_encoder.encoder.layers.5.self_attn.linear_k.bias", "speech_encoder.encoder.layers.5.self_attn.linear_v.weight", "speech_encoder.encoder.layers.5.self_attn.linear_v.bias", "speech_encoder.encoder.layers.5.self_attn.linear_out.weight", "speech_encoder.encoder.layers.5.self_attn.linear_out.bias", "speech_encoder.encoder.layers.5.self_attn.linear_pos.weight", "speech_encoder.encoder.layers.5.conv_module.layer_norm.weight", "speech_encoder.encoder.layers.5.conv_module.layer_norm.bias", "speech_encoder.encoder.layers.5.conv_module.pointwise_conv1.weight", "speech_encoder.encoder.layers.5.conv_module.depthwise_conv.weight", "speech_encoder.encoder.layers.5.conv_module.batch_norm.weight", "speech_encoder.encoder.layers.5.conv_module.batch_norm.bias", "speech_encoder.encoder.layers.5.conv_module.batch_norm.running_mean", "speech_encoder.encoder.layers.5.conv_module.batch_norm.running_var", "speech_encoder.encoder.layers.5.conv_module.pointwise_conv2.weight", "speech_encoder.encoder.layers.5.ffn2_layer_norm.weight", "speech_encoder.encoder.layers.5.ffn2_layer_norm.bias", "speech_encoder.encoder.layers.5.ffn2.intermediate_dense.weight", "speech_encoder.encoder.layers.5.ffn2.intermediate_dense.bias", "speech_encoder.encoder.layers.5.ffn2.output_dense.weight", "speech_encoder.encoder.layers.5.ffn2.output_dense.bias", "speech_encoder.encoder.layers.5.final_layer_norm.weight", "speech_encoder.encoder.layers.5.final_layer_norm.bias", "speech_encoder.encoder.layers.6.ffn1_layer_norm.weight", "speech_encoder.encoder.layers.6.ffn1_layer_norm.bias", "speech_encoder.encoder.layers.6.ffn1.intermediate_dense.weight", "speech_encoder.encoder.layers.6.ffn1.intermediate_dense.bias", "speech_encoder.encoder.layers.6.ffn1.output_dense.weight", "speech_encoder.encoder.layers.6.ffn1.output_dense.bias", "speech_encoder.encoder.layers.6.self_attn_layer_norm.weight", "speech_encoder.encoder.layers.6.self_attn_layer_norm.bias", "speech_encoder.encoder.layers.6.self_attn.pos_bias_u", "speech_encoder.encoder.layers.6.self_attn.pos_bias_v", "speech_encoder.encoder.layers.6.self_attn.linear_q.weight", "speech_encoder.encoder.layers.6.self_attn.linear_q.bias", "speech_encoder.encoder.layers.6.self_attn.linear_k.weight", "speech_encoder.encoder.layers.6.self_attn.linear_k.bias", "speech_encoder.encoder.layers.6.self_attn.linear_v.weight", "speech_encoder.encoder.layers.6.self_attn.linear_v.bias", "speech_encoder.encoder.layers.6.self_attn.linear_out.weight", "speech_encoder.encoder.layers.6.self_attn.linear_out.bias", "speech_encoder.encoder.layers.6.self_attn.linear_pos.weight", "speech_encoder.encoder.layers.6.conv_module.layer_norm.weight", "speech_encoder.encoder.layers.6.conv_module.layer_norm.bias", "speech_encoder.encoder.layers.6.conv_module.pointwise_conv1.weight", "speech_encoder.encoder.layers.6.conv_module.depthwise_conv.weight", "speech_encoder.encoder.layers.6.conv_module.batch_norm.weight", "speech_encoder.encoder.layers.6.conv_module.batch_norm.bias", "speech_encoder.encoder.layers.6.conv_module.batch_norm.running_mean", "speech_encoder.encoder.layers.6.conv_module.batch_norm.running_var", "speech_encoder.encoder.layers.6.conv_module.pointwise_conv2.weight", "speech_encoder.encoder.layers.6.ffn2_layer_norm.weight", "speech_encoder.encoder.layers.6.ffn2_layer_norm.bias", "speech_encoder.encoder.layers.6.ffn2.intermediate_dense.weight", "speech_encoder.encoder.layers.6.ffn2.intermediate_dense.bias", "speech_encoder.encoder.layers.6.ffn2.output_dense.weight", "speech_encoder.encoder.layers.6.ffn2.output_dense.bias", "speech_encoder.encoder.layers.6.final_layer_norm.weight", "speech_encoder.encoder.layers.6.final_layer_norm.bias", "speech_encoder.encoder.layers.7.ffn1_layer_norm.weight", "speech_encoder.encoder.layers.7.ffn1_layer_norm.bias", "speech_encoder.encoder.layers.7.ffn1.intermediate_dense.weight", "speech_encoder.encoder.layers.7.ffn1.intermediate_dense.bias", "speech_encoder.encoder.layers.7.ffn1.output_dense.weight", "speech_encoder.encoder.layers.7.ffn1.output_dense.bias", "speech_encoder.encoder.layers.7.self_attn_layer_norm.weight", "speech_encoder.encoder.layers.7.self_attn_layer_norm.bias", "speech_encoder.encoder.layers.7.self_attn.pos_bias_u", "speech_encoder.encoder.layers.7.self_attn.pos_bias_v", "speech_encoder.encoder.layers.7.self_attn.linear_q.weight", "speech_encoder.encoder.layers.7.self_attn.linear_q.bias", "speech_encoder.encoder.layers.7.self_attn.linear_k.weight", "speech_encoder.encoder.layers.7.self_attn.linear_k.bias", "speech_encoder.encoder.layers.7.self_attn.linear_v.weight", "speech_encoder.encoder.layers.7.self_attn.linear_v.bias", "speech_encoder.encoder.layers.7.self_attn.linear_out.weight", "speech_encoder.encoder.layers.7.self_attn.linear_out.bias", "speech_encoder.encoder.layers.7.self_attn.linear_pos.weight", "speech_encoder.encoder.layers.7.conv_module.layer_norm.weight", "speech_encoder.encoder.layers.7.conv_module.layer_norm.bias", "speech_encoder.encoder.layers.7.conv_module.pointwise_conv1.weight", "speech_encoder.encoder.layers.7.conv_module.depthwise_conv.weight", "speech_encoder.encoder.layers.7.conv_module.batch_norm.weight", "speech_encoder.encoder.layers.7.conv_module.batch_norm.bias", "speech_encoder.encoder.layers.7.conv_module.batch_norm.running_mean", "speech_encoder.encoder.layers.7.conv_module.batch_norm.running_var", "speech_encoder.encoder.layers.7.conv_module.pointwise_conv2.weight", "speech_encoder.encoder.layers.7.ffn2_layer_norm.weight", "speech_encoder.encoder.layers.7.ffn2_layer_norm.bias", "speech_encoder.encoder.layers.7.ffn2.intermediate_dense.weight", "speech_encoder.encoder.layers.7.ffn2.intermediate_dense.bias", "speech_encoder.encoder.layers.7.ffn2.output_dense.weight", "speech_encoder.encoder.layers.7.ffn2.output_dense.bias", "speech_encoder.encoder.layers.7.final_layer_norm.weight", "speech_encoder.encoder.layers.7.final_layer_norm.bias", "speech_encoder.encoder.layers.8.ffn1_layer_norm.weight", "speech_encoder.encoder.layers.8.ffn1_layer_norm.bias", "speech_encoder.encoder.layers.8.ffn1.intermediate_dense.weight", "speech_encoder.encoder.layers.8.ffn1.intermediate_dense.bias", "speech_encoder.encoder.layers.8.ffn1.output_dense.weight", "speech_encoder.encoder.layers.8.ffn1.output_dense.bias", "speech_encoder.encoder.layers.8.self_attn_layer_norm.weight", "speech_encoder.encoder.layers.8.self_attn_layer_norm.bias", "speech_encoder.encoder.layers.8.self_attn.pos_bias_u", "speech_encoder.encoder.layers.8.self_attn.pos_bias_v", "speech_encoder.encoder.layers.8.self_attn.linear_q.weight", "speech_encoder.encoder.layers.8.self_attn.linear_q.bias", "speech_encoder.encoder.layers.8.self_attn.linear_k.weight", "speech_encoder.encoder.layers.8.self_attn.linear_k.bias", "speech_encoder.encoder.layers.8.self_attn.linear_v.weight", "speech_encoder.encoder.layers.8.self_attn.linear_v.bias", "speech_encoder.encoder.layers.8.self_attn.linear_out.weight", "speech_encoder.encoder.layers.8.self_attn.linear_out.bias", "speech_encoder.encoder.layers.8.self_attn.linear_pos.weight", "speech_encoder.encoder.layers.8.conv_module.layer_norm.weight", "speech_encoder.encoder.layers.8.conv_module.layer_norm.bias", "speech_encoder.encoder.layers.8.conv_module.pointwise_conv1.weight", "speech_encoder.encoder.layers.8.conv_module.depthwise_conv.weight", "speech_encoder.encoder.layers.8.conv_module.batch_norm.weight", "speech_encoder.encoder.layers.8.conv_module.batch_norm.bias", "speech_encoder.encoder.layers.8.conv_module.batch_norm.running_mean", "speech_encoder.encoder.layers.8.conv_module.batch_norm.running_var", "speech_encoder.encoder.layers.8.conv_module.pointwise_conv2.weight", "speech_encoder.encoder.layers.8.ffn2_layer_norm.weight", "speech_encoder.encoder.layers.8.ffn2_layer_norm.bias", "speech_encoder.encoder.layers.8.ffn2.intermediate_dense.weight", "speech_encoder.encoder.layers.8.ffn2.intermediate_dense.bias", "speech_encoder.encoder.layers.8.ffn2.output_dense.weight", "speech_encoder.encoder.layers.8.ffn2.output_dense.bias", "speech_encoder.encoder.layers.8.final_layer_norm.weight", "speech_encoder.encoder.layers.8.final_layer_norm.bias", "speech_encoder.encoder.layers.9.ffn1_layer_norm.weight", "speech_encoder.encoder.layers.9.ffn1_layer_norm.bias", "speech_encoder.encoder.layers.9.ffn1.intermediate_dense.weight", "speech_encoder.encoder.layers.9.ffn1.intermediate_dense.bias", "speech_encoder.encoder.layers.9.ffn1.output_dense.weight", "speech_encoder.encoder.layers.9.ffn1.output_dense.bias", "speech_encoder.encoder.layers.9.self_attn_layer_norm.weight", "speech_encoder.encoder.layers.9.self_attn_layer_norm.bias", "speech_encoder.encoder.layers.9.self_attn.pos_bias_u", "speech_encoder.encoder.layers.9.self_attn.pos_bias_v", "speech_encoder.encoder.layers.9.self_attn.linear_q.weight", "speech_encoder.encoder.layers.9.self_attn.linear_q.bias", "speech_encoder.encoder.layers.9.self_attn.linear_k.weight", "speech_encoder.encoder.layers.9.self_attn.linear_k.bias", "speech_encoder.encoder.layers.9.self_attn.linear_v.weight", "speech_encoder.encoder.layers.9.self_attn.linear_v.bias", "speech_encoder.encoder.layers.9.self_attn.linear_out.weight", "speech_encoder.encoder.layers.9.self_attn.linear_out.bias", "speech_encoder.encoder.layers.9.self_attn.linear_pos.weight", "speech_encoder.encoder.layers.9.conv_module.layer_norm.weight", "speech_encoder.encoder.layers.9.conv_module.layer_norm.bias", "speech_encoder.encoder.layers.9.conv_module.pointwise_conv1.weight", "speech_encoder.encoder.layers.9.conv_module.depthwise_conv.weight", "speech_encoder.encoder.layers.9.conv_module.batch_norm.weight", "speech_encoder.encoder.layers.9.conv_module.batch_norm.bias", "speech_encoder.encoder.layers.9.conv_module.batch_norm.running_mean", "speech_encoder.encoder.layers.9.conv_module.batch_norm.running_var", "speech_encoder.encoder.layers.9.conv_module.pointwise_conv2.weight", "speech_encoder.encoder.layers.9.ffn2_layer_norm.weight", "speech_encoder.encoder.layers.9.ffn2_layer_norm.bias", "speech_encoder.encoder.layers.9.ffn2.intermediate_dense.weight", "speech_encoder.encoder.layers.9.ffn2.intermediate_dense.bias", "speech_encoder.encoder.layers.9.ffn2.output_dense.weight", "speech_encoder.encoder.layers.9.ffn2.output_dense.bias", "speech_encoder.encoder.layers.9.final_layer_norm.weight", "speech_encoder.encoder.layers.9.final_layer_norm.bias", "speech_encoder.encoder.layers.10.ffn1_layer_norm.weight", "speech_encoder.encoder.layers.10.ffn1_layer_norm.bias", "speech_encoder.encoder.layers.10.ffn1.intermediate_dense.weight", "speech_encoder.encoder.layers.10.ffn1.intermediate_dense.bias", "speech_encoder.encoder.layers.10.ffn1.output_dense.weight", "speech_encoder.encoder.layers.10.ffn1.output_dense.bias", "speech_encoder.encoder.layers.10.self_attn_layer_norm.weight", "speech_encoder.encoder.layers.10.self_attn_layer_norm.bias", "speech_encoder.encoder.layers.10.self_attn.pos_bias_u", "speech_encoder.encoder.layers.10.self_attn.pos_bias_v", "speech_encoder.encoder.layers.10.self_attn.linear_q.weight", "speech_encoder.encoder.layers.10.self_attn.linear_q.bias", "speech_encoder.encoder.layers.10.self_attn.linear_k.weight", "speech_encoder.encoder.layers.10.self_attn.linear_k.bias", "speech_encoder.encoder.layers.10.self_attn.linear_v.weight", "speech_encoder.encoder.layers.10.self_attn.linear_v.bias", "speech_encoder.encoder.layers.10.self_attn.linear_out.weight", "speech_encoder.encoder.layers.10.self_attn.linear_out.bias", "speech_encoder.encoder.layers.10.self_attn.linear_pos.weight", "speech_encoder.encoder.layers.10.conv_module.layer_norm.weight", "speech_encoder.encoder.layers.10.conv_module.layer_norm.bias", "speech_encoder.encoder.layers.10.conv_module.pointwise_conv1.weight", "speech_encoder.encoder.layers.10.conv_module.depthwise_conv.weight", "speech_encoder.encoder.layers.10.conv_module.batch_norm.weight", "speech_encoder.encoder.layers.10.conv_module.batch_norm.bias", "speech_encoder.encoder.layers.10.conv_module.batch_norm.running_mean", "speech_encoder.encoder.layers.10.conv_module.batch_norm.running_var", "speech_encoder.encoder.layers.10.conv_module.pointwise_conv2.weight", "speech_encoder.encoder.layers.10.ffn2_layer_norm.weight", "speech_encoder.encoder.layers.10.ffn2_layer_norm.bias", "speech_encoder.encoder.layers.10.ffn2.intermediate_dense.weight", "speech_encoder.encoder.layers.10.ffn2.intermediate_dense.bias", "speech_encoder.encoder.layers.10.ffn2.output_dense.weight", "speech_encoder.encoder.layers.10.ffn2.output_dense.bias", "speech_encoder.encoder.layers.10.final_layer_norm.weight", "speech_encoder.encoder.layers.10.final_layer_norm.bias", "speech_encoder.encoder.layers.11.ffn1_layer_norm.weight", "speech_encoder.encoder.layers.11.ffn1_layer_norm.bias", "speech_encoder.encoder.layers.11.ffn1.intermediate_dense.weight", "speech_encoder.encoder.layers.11.ffn1.intermediate_dense.bias", "speech_encoder.encoder.layers.11.ffn1.output_dense.weight", "speech_encoder.encoder.layers.11.ffn1.output_dense.bias", "speech_encoder.encoder.layers.11.self_attn_layer_norm.weight", "speech_encoder.encoder.layers.11.self_attn_layer_norm.bias", "speech_encoder.encoder.layers.11.self_attn.pos_bias_u", "speech_encoder.encoder.layers.11.self_attn.pos_bias_v", "speech_encoder.encoder.layers.11.self_attn.linear_q.weight", "speech_encoder.encoder.layers.11.self_attn.linear_q.bias", "speech_encoder.encoder.layers.11.self_attn.linear_k.weight", "speech_encoder.encoder.layers.11.self_attn.linear_k.bias", "speech_encoder.encoder.layers.11.self_attn.linear_v.weight", "speech_encoder.encoder.layers.11.self_attn.linear_v.bias", "speech_encoder.encoder.layers.11.self_attn.linear_out.weight", "speech_encoder.encoder.layers.11.self_attn.linear_out.bias", "speech_encoder.encoder.layers.11.self_attn.linear_pos.weight", "speech_encoder.encoder.layers.11.conv_module.layer_norm.weight", "speech_encoder.encoder.layers.11.conv_module.layer_norm.bias", "speech_encoder.encoder.layers.11.conv_module.pointwise_conv1.weight", "speech_encoder.encoder.layers.11.conv_module.depthwise_conv.weight", "speech_encoder.encoder.layers.11.conv_module.batch_norm.weight", "speech_encoder.encoder.layers.11.conv_module.batch_norm.bias", "speech_encoder.encoder.layers.11.conv_module.batch_norm.running_mean", "speech_encoder.encoder.layers.11.conv_module.batch_norm.running_var", "speech_encoder.encoder.layers.11.conv_module.pointwise_conv2.weight", "speech_encoder.encoder.layers.11.ffn2_layer_norm.weight", "speech_encoder.encoder.layers.11.ffn2_layer_norm.bias", "speech_encoder.encoder.layers.11.ffn2.intermediate_dense.weight", "speech_encoder.encoder.layers.11.ffn2.intermediate_dense.bias", "speech_encoder.encoder.layers.11.ffn2.output_dense.weight", "speech_encoder.encoder.layers.11.ffn2.output_dense.bias", "speech_encoder.encoder.layers.11.final_layer_norm.weight", "speech_encoder.encoder.layers.11.final_layer_norm.bias", "speech_encoder.encoder.layer_norm.weight", "speech_encoder.encoder.layer_norm.bias", "speech_encoder.intermediate_ffn.intermediate_dense.weight", "speech_encoder.intermediate_ffn.intermediate_dense.bias", "speech_encoder.intermediate_ffn.output_dense.weight", "speech_encoder.intermediate_ffn.output_dense.bias", "speech_encoder.adapter.layers.0.residual_layer_norm.weight", "speech_encoder.adapter.layers.0.residual_layer_norm.bias", "speech_encoder.adapter.layers.0.residual_conv.weight", "speech_encoder.adapter.layers.0.residual_conv.bias", "speech_encoder.adapter.layers.0.self_attn_layer_norm.weight", "speech_encoder.adapter.layers.0.self_attn_layer_norm.bias", "speech_encoder.adapter.layers.0.self_attn_conv.weight", "speech_encoder.adapter.layers.0.self_attn_conv.bias", "speech_encoder.adapter.layers.0.self_attn.linear_q.weight", "speech_encoder.adapter.layers.0.self_attn.linear_q.bias", "speech_encoder.adapter.layers.0.self_attn.linear_k.weight", "speech_encoder.adapter.layers.0.self_attn.linear_k.bias", "speech_encoder.adapter.layers.0.self_attn.linear_v.weight", "speech_encoder.adapter.layers.0.self_attn.linear_v.bias", "speech_encoder.adapter.layers.0.self_attn.linear_out.weight", "speech_encoder.adapter.layers.0.self_attn.linear_out.bias", "speech_encoder.adapter.layers.0.ffn_layer_norm.weight", "speech_encoder.adapter.layers.0.ffn_layer_norm.bias", "speech_encoder.adapter.layers.0.ffn.intermediate_dense.weight", "speech_encoder.adapter.layers.0.ffn.intermediate_dense.bias", "speech_encoder.adapter.layers.0.ffn.output_dense.weight", "speech_encoder.adapter.layers.0.ffn.output_dense.bias", "speech_encoder.inner_layer_norm.weight", "speech_encoder.inner_layer_norm.bias", "text_decoder.embed_tokens.weight", "text_decoder.layers.0.self_attn.k_proj.weight", "text_decoder.layers.0.self_attn.k_proj.bias", "text_decoder.layers.0.self_attn.v_proj.weight", "text_decoder.layers.0.self_attn.v_proj.bias", "text_decoder.layers.0.self_attn.q_proj.weight", "text_decoder.layers.0.self_attn.q_proj.bias", "text_decoder.layers.0.self_attn.out_proj.weight", "text_decoder.layers.0.self_attn.out_proj.bias", "text_decoder.layers.0.self_attn_layer_norm.weight", "text_decoder.layers.0.self_attn_layer_norm.bias", "text_decoder.layers.0.cross_attention.k_proj.weight", "text_decoder.layers.0.cross_attention.k_proj.bias", "text_decoder.layers.0.cross_attention.v_proj.weight", "text_decoder.layers.0.cross_attention.v_proj.bias", "text_decoder.layers.0.cross_attention.q_proj.weight", "text_decoder.layers.0.cross_attention.q_proj.bias", "text_decoder.layers.0.cross_attention.out_proj.weight", "text_decoder.layers.0.cross_attention.out_proj.bias", "text_decoder.layers.0.cross_attention_layer_norm.weight", "text_decoder.layers.0.cross_attention_layer_norm.bias", "text_decoder.layers.0.ffn.fc1.weight", "text_decoder.layers.0.ffn.fc1.bias", "text_decoder.layers.0.ffn.fc2.weight", "text_decoder.layers.0.ffn.fc2.bias", "text_decoder.layers.0.ffn_layer_norm.weight", "text_decoder.layers.0.ffn_layer_norm.bias", "text_decoder.layers.1.self_attn.k_proj.weight", "text_decoder.layers.1.self_attn.k_proj.bias", "text_decoder.layers.1.self_attn.v_proj.weight", "text_decoder.layers.1.self_attn.v_proj.bias", "text_decoder.layers.1.self_attn.q_proj.weight", "text_decoder.layers.1.self_attn.q_proj.bias", "text_decoder.layers.1.self_attn.out_proj.weight", "text_decoder.layers.1.self_attn.out_proj.bias", "text_decoder.layers.1.self_attn_layer_norm.weight", "text_decoder.layers.1.self_attn_layer_norm.bias", "text_decoder.layers.1.cross_attention.k_proj.weight", "text_decoder.layers.1.cross_attention.k_proj.bias", "text_decoder.layers.1.cross_attention.v_proj.weight", "text_decoder.layers.1.cross_attention.v_proj.bias", "text_decoder.layers.1.cross_attention.q_proj.weight", "text_decoder.layers.1.cross_attention.q_proj.bias", "text_decoder.layers.1.cross_attention.out_proj.weight", "text_decoder.layers.1.cross_attention.out_proj.bias", "text_decoder.layers.1.cross_attention_layer_norm.weight", "text_decoder.layers.1.cross_attention_layer_norm.bias", "text_decoder.layers.1.ffn.fc1.weight", "text_decoder.layers.1.ffn.fc1.bias", "text_decoder.layers.1.ffn.fc2.weight", "text_decoder.layers.1.ffn.fc2.bias", "text_decoder.layers.1.ffn_layer_norm.weight", "text_decoder.layers.1.ffn_layer_norm.bias", "text_decoder.layers.2.self_attn.k_proj.weight", "text_decoder.layers.2.self_attn.k_proj.bias", "text_decoder.layers.2.self_attn.v_proj.weight", "text_decoder.layers.2.self_attn.v_proj.bias", "text_decoder.layers.2.self_attn.q_proj.weight", "text_decoder.layers.2.self_attn.q_proj.bias", "text_decoder.layers.2.self_attn.out_proj.weight", "text_decoder.layers.2.self_attn.out_proj.bias", "text_decoder.layers.2.self_attn_layer_norm.weight", "text_decoder.layers.2.self_attn_layer_norm.bias", "text_decoder.layers.2.cross_attention.k_proj.weight", "text_decoder.layers.2.cross_attention.k_proj.bias", "text_decoder.layers.2.cross_attention.v_proj.weight", "text_decoder.layers.2.cross_attention.v_proj.bias", "text_decoder.layers.2.cross_attention.q_proj.weight", "text_decoder.layers.2.cross_attention.q_proj.bias", "text_decoder.layers.2.cross_attention.out_proj.weight", "text_decoder.layers.2.cross_attention.out_proj.bias", "text_decoder.layers.2.cross_attention_layer_norm.weight", "text_decoder.layers.2.cross_attention_layer_norm.bias", "text_decoder.layers.2.ffn.fc1.weight", "text_decoder.layers.2.ffn.fc1.bias", "text_decoder.layers.2.ffn.fc2.weight", "text_decoder.layers.2.ffn.fc2.bias", "text_decoder.layers.2.ffn_layer_norm.weight", "text_decoder.layers.2.ffn_layer_norm.bias", "text_decoder.layers.3.self_attn.k_proj.weight", "text_decoder.layers.3.self_attn.k_proj.bias", "text_decoder.layers.3.self_attn.v_proj.weight", "text_decoder.layers.3.self_attn.v_proj.bias", "text_decoder.layers.3.self_attn.q_proj.weight", "text_decoder.layers.3.self_attn.q_proj.bias", "text_decoder.layers.3.self_attn.out_proj.weight", "text_decoder.layers.3.self_attn.out_proj.bias", "text_decoder.layers.3.self_attn_layer_norm.weight", "text_decoder.layers.3.self_attn_layer_norm.bias", "text_decoder.layers.3.cross_attention.k_proj.weight", "text_decoder.layers.3.cross_attention.k_proj.bias", "text_decoder.layers.3.cross_attention.v_proj.weight", "text_decoder.layers.3.cross_attention.v_proj.bias", "text_decoder.layers.3.cross_attention.q_proj.weight", "text_decoder.layers.3.cross_attention.q_proj.bias", "text_decoder.layers.3.cross_attention.out_proj.weight", "text_decoder.layers.3.cross_attention.out_proj.bias", "text_decoder.layers.3.cross_attention_layer_norm.weight", "text_decoder.layers.3.cross_attention_layer_norm.bias", "text_decoder.layers.3.ffn.fc1.weight", "text_decoder.layers.3.ffn.fc1.bias", "text_decoder.layers.3.ffn.fc2.weight", "text_decoder.layers.3.ffn.fc2.bias", "text_decoder.layers.3.ffn_layer_norm.weight", "text_decoder.layers.3.ffn_layer_norm.bias", "text_decoder.layers.4.self_attn.k_proj.weight", "text_decoder.layers.4.self_attn.k_proj.bias", "text_decoder.layers.4.self_attn.v_proj.weight", "text_decoder.layers.4.self_attn.v_proj.bias", "text_decoder.layers.4.self_attn.q_proj.weight", "text_decoder.layers.4.self_attn.q_proj.bias", "text_decoder.layers.4.self_attn.out_proj.weight", "text_decoder.layers.4.self_attn.out_proj.bias", "text_decoder.layers.4.self_attn_layer_norm.weight", "text_decoder.layers.4.self_attn_layer_norm.bias", "text_decoder.layers.4.cross_attention.k_proj.weight", "text_decoder.layers.4.cross_attention.k_proj.bias", "text_decoder.layers.4.cross_attention.v_proj.weight", "text_decoder.layers.4.cross_attention.v_proj.bias", "text_decoder.layers.4.cross_attention.q_proj.weight", "text_decoder.layers.4.cross_attention.q_proj.bias", "text_decoder.layers.4.cross_attention.out_proj.weight", "text_decoder.layers.4.cross_attention.out_proj.bias", "text_decoder.layers.4.cross_attention_layer_norm.weight", "text_decoder.layers.4.cross_attention_layer_norm.bias", "text_decoder.layers.4.ffn.fc1.weight", "text_decoder.layers.4.ffn.fc1.bias", "text_decoder.layers.4.ffn.fc2.weight", "text_decoder.layers.4.ffn.fc2.bias", "text_decoder.layers.4.ffn_layer_norm.weight", "text_decoder.layers.4.ffn_layer_norm.bias", "text_decoder.layers.5.self_attn.k_proj.weight", "text_decoder.layers.5.self_attn.k_proj.bias", "text_decoder.layers.5.self_attn.v_proj.weight", "text_decoder.layers.5.self_attn.v_proj.bias", "text_decoder.layers.5.self_attn.q_proj.weight", "text_decoder.layers.5.self_attn.q_proj.bias", "text_decoder.layers.5.self_attn.out_proj.weight", "text_decoder.layers.5.self_attn.out_proj.bias", "text_decoder.layers.5.self_attn_layer_norm.weight", "text_decoder.layers.5.self_attn_layer_norm.bias", "text_decoder.layers.5.cross_attention.k_proj.weight", "text_decoder.layers.5.cross_attention.k_proj.bias", "text_decoder.layers.5.cross_attention.v_proj.weight", "text_decoder.layers.5.cross_attention.v_proj.bias", "text_decoder.layers.5.cross_attention.q_proj.weight", "text_decoder.layers.5.cross_attention.q_proj.bias", "text_decoder.layers.5.cross_attention.out_proj.weight", "text_decoder.layers.5.cross_attention.out_proj.bias", "text_decoder.layers.5.cross_attention_layer_norm.weight", "text_decoder.layers.5.cross_attention_layer_norm.bias", "text_decoder.layers.5.ffn.fc1.weight", "text_decoder.layers.5.ffn.fc1.bias", "text_decoder.layers.5.ffn.fc2.weight", "text_decoder.layers.5.ffn.fc2.bias", "text_decoder.layers.5.ffn_layer_norm.weight", "text_decoder.layers.5.ffn_layer_norm.bias", "text_decoder.layers.6.self_attn.k_proj.weight", "text_decoder.layers.6.self_attn.k_proj.bias", "text_decoder.layers.6.self_attn.v_proj.weight", "text_decoder.layers.6.self_attn.v_proj.bias", "text_decoder.layers.6.self_attn.q_proj.weight", "text_decoder.layers.6.self_attn.q_proj.bias", "text_decoder.layers.6.self_attn.out_proj.weight", "text_decoder.layers.6.self_attn.out_proj.bias", "text_decoder.layers.6.self_attn_layer_norm.weight", "text_decoder.layers.6.self_attn_layer_norm.bias", "text_decoder.layers.6.cross_attention.k_proj.weight", "text_decoder.layers.6.cross_attention.k_proj.bias", "text_decoder.layers.6.cross_attention.v_proj.weight", "text_decoder.layers.6.cross_attention.v_proj.bias", "text_decoder.layers.6.cross_attention.q_proj.weight", "text_decoder.layers.6.cross_attention.q_proj.bias", "text_decoder.layers.6.cross_attention.out_proj.weight", "text_decoder.layers.6.cross_attention.out_proj.bias", "text_decoder.layers.6.cross_attention_layer_norm.weight", "text_decoder.layers.6.cross_attention_layer_norm.bias", "text_decoder.layers.6.ffn.fc1.weight", "text_decoder.layers.6.ffn.fc1.bias", "text_decoder.layers.6.ffn.fc2.weight", "text_decoder.layers.6.ffn.fc2.bias", "text_decoder.layers.6.ffn_layer_norm.weight", "text_decoder.layers.6.ffn_layer_norm.bias", "text_decoder.layers.7.self_attn.k_proj.weight", "text_decoder.layers.7.self_attn.k_proj.bias", "text_decoder.layers.7.self_attn.v_proj.weight", "text_decoder.layers.7.self_attn.v_proj.bias", "text_decoder.layers.7.self_attn.q_proj.weight", "text_decoder.layers.7.self_attn.q_proj.bias", "text_decoder.layers.7.self_attn.out_proj.weight", "text_decoder.layers.7.self_attn.out_proj.bias", "text_decoder.layers.7.self_attn_layer_norm.weight", "text_decoder.layers.7.self_attn_layer_norm.bias", "text_decoder.layers.7.cross_attention.k_proj.weight", "text_decoder.layers.7.cross_attention.k_proj.bias", "text_decoder.layers.7.cross_attention.v_proj.weight", "text_decoder.layers.7.cross_attention.v_proj.bias", "text_decoder.layers.7.cross_attention.q_proj.weight", "text_decoder.layers.7.cross_attention.q_proj.bias", "text_decoder.layers.7.cross_attention.out_proj.weight", "text_decoder.layers.7.cross_attention.out_proj.bias", "text_decoder.layers.7.cross_attention_layer_norm.weight", "text_decoder.layers.7.cross_attention_layer_norm.bias", "text_decoder.layers.7.ffn.fc1.weight", "text_decoder.layers.7.ffn.fc1.bias", "text_decoder.layers.7.ffn.fc2.weight", "text_decoder.layers.7.ffn.fc2.bias", "text_decoder.layers.7.ffn_layer_norm.weight", "text_decoder.layers.7.ffn_layer_norm.bias", "text_decoder.layers.8.self_attn.k_proj.weight", "text_decoder.layers.8.self_attn.k_proj.bias", "text_decoder.layers.8.self_attn.v_proj.weight", "text_decoder.layers.8.self_attn.v_proj.bias", "text_decoder.layers.8.self_attn.q_proj.weight", "text_decoder.layers.8.self_attn.q_proj.bias", "text_decoder.layers.8.self_attn.out_proj.weight", "text_decoder.layers.8.self_attn.out_proj.bias", "text_decoder.layers.8.self_attn_layer_norm.weight", "text_decoder.layers.8.self_attn_layer_norm.bias", "text_decoder.layers.8.cross_attention.k_proj.weight", "text_decoder.layers.8.cross_attention.k_proj.bias", "text_decoder.layers.8.cross_attention.v_proj.weight", "text_decoder.layers.8.cross_attention.v_proj.bias", "text_decoder.layers.8.cross_attention.q_proj.weight", "text_decoder.layers.8.cross_attention.q_proj.bias", "text_decoder.layers.8.cross_attention.out_proj.weight", "text_decoder.layers.8.cross_attention.out_proj.bias", "text_decoder.layers.8.cross_attention_layer_norm.weight", "text_decoder.layers.8.cross_attention_layer_norm.bias", "text_decoder.layers.8.ffn.fc1.weight", "text_decoder.layers.8.ffn.fc1.bias", "text_decoder.layers.8.ffn.fc2.weight", "text_decoder.layers.8.ffn.fc2.bias", "text_decoder.layers.8.ffn_layer_norm.weight", "text_decoder.layers.8.ffn_layer_norm.bias", "text_decoder.layers.9.self_attn.k_proj.weight", "text_decoder.layers.9.self_attn.k_proj.bias", "text_decoder.layers.9.self_attn.v_proj.weight", "text_decoder.layers.9.self_attn.v_proj.bias", "text_decoder.layers.9.self_attn.q_proj.weight", "text_decoder.layers.9.self_attn.q_proj.bias", "text_decoder.layers.9.self_attn.out_proj.weight", "text_decoder.layers.9.self_attn.out_proj.bias", "text_decoder.layers.9.self_attn_layer_norm.weight", "text_decoder.layers.9.self_attn_layer_norm.bias", "text_decoder.layers.9.cross_attention.k_proj.weight", "text_decoder.layers.9.cross_attention.k_proj.bias", "text_decoder.layers.9.cross_attention.v_proj.weight", "text_decoder.layers.9.cross_attention.v_proj.bias", "text_decoder.layers.9.cross_attention.q_proj.weight", "text_decoder.layers.9.cross_attention.q_proj.bias", "text_decoder.layers.9.cross_attention.out_proj.weight", "text_decoder.layers.9.cross_attention.out_proj.bias", "text_decoder.layers.9.cross_attention_layer_norm.weight", "text_decoder.layers.9.cross_attention_layer_norm.bias", "text_decoder.layers.9.ffn.fc1.weight", "text_decoder.layers.9.ffn.fc1.bias", "text_decoder.layers.9.ffn.fc2.weight", "text_decoder.layers.9.ffn.fc2.bias", "text_decoder.layers.9.ffn_layer_norm.weight", "text_decoder.layers.9.ffn_layer_norm.bias", "text_decoder.layers.10.self_attn.k_proj.weight", "text_decoder.layers.10.self_attn.k_proj.bias", "text_decoder.layers.10.self_attn.v_proj.weight", "text_decoder.layers.10.self_attn.v_proj.bias", "text_decoder.layers.10.self_attn.q_proj.weight", "text_decoder.layers.10.self_attn.q_proj.bias", "text_decoder.layers.10.self_attn.out_proj.weight", "text_decoder.layers.10.self_attn.out_proj.bias", "text_decoder.layers.10.self_attn_layer_norm.weight", "text_decoder.layers.10.self_attn_layer_norm.bias", "text_decoder.layers.10.cross_attention.k_proj.weight", "text_decoder.layers.10.cross_attention.k_proj.bias", "text_decoder.layers.10.cross_attention.v_proj.weight", "text_decoder.layers.10.cross_attention.v_proj.bias", "text_decoder.layers.10.cross_attention.q_proj.weight", "text_decoder.layers.10.cross_attention.q_proj.bias", "text_decoder.layers.10.cross_attention.out_proj.weight", "text_decoder.layers.10.cross_attention.out_proj.bias", "text_decoder.layers.10.cross_attention_layer_norm.weight", "text_decoder.layers.10.cross_attention_layer_norm.bias", "text_decoder.layers.10.ffn.fc1.weight", "text_decoder.layers.10.ffn.fc1.bias", "text_decoder.layers.10.ffn.fc2.weight", "text_decoder.layers.10.ffn.fc2.bias", "text_decoder.layers.10.ffn_layer_norm.weight", "text_decoder.layers.10.ffn_layer_norm.bias", "text_decoder.layers.11.self_attn.k_proj.weight", "text_decoder.layers.11.self_attn.k_proj.bias", "text_decoder.layers.11.self_attn.v_proj.weight", "text_decoder.layers.11.self_attn.v_proj.bias", "text_decoder.layers.11.self_attn.q_proj.weight", "text_decoder.layers.11.self_attn.q_proj.bias", "text_decoder.layers.11.self_attn.out_proj.weight", "text_decoder.layers.11.self_attn.out_proj.bias", "text_decoder.layers.11.self_attn_layer_norm.weight", "text_decoder.layers.11.self_attn_layer_norm.bias", "text_decoder.layers.11.cross_attention.k_proj.weight", "text_decoder.layers.11.cross_attention.k_proj.bias", "text_decoder.layers.11.cross_attention.v_proj.weight", "text_decoder.layers.11.cross_attention.v_proj.bias", "text_decoder.layers.11.cross_attention.q_proj.weight", "text_decoder.layers.11.cross_attention.q_proj.bias", "text_decoder.layers.11.cross_attention.out_proj.weight", "text_decoder.layers.11.cross_attention.out_proj.bias", "text_decoder.layers.11.cross_attention_layer_norm.weight", "text_decoder.layers.11.cross_attention_layer_norm.bias", "text_decoder.layers.11.ffn.fc1.weight", "text_decoder.layers.11.ffn.fc1.bias", "text_decoder.layers.11.ffn.fc2.weight", "text_decoder.layers.11.ffn.fc2.bias", "text_decoder.layers.11.ffn_layer_norm.weight", "text_decoder.layers.11.ffn_layer_norm.bias", "text_decoder.layer_norm.weight", "text_decoder.layer_norm.bias", "lm_head.weight", "t2u_model.model.encoder.layers.0.self_attn.k_proj.weight", "t2u_model.model.encoder.layers.0.self_attn.k_proj.bias", "t2u_model.model.encoder.layers.0.self_attn.v_proj.weight", "t2u_model.model.encoder.layers.0.self_attn.v_proj.bias", "t2u_model.model.encoder.layers.0.self_attn.q_proj.weight", "t2u_model.model.encoder.layers.0.self_attn.q_proj.bias", "t2u_model.model.encoder.layers.0.self_attn.out_proj.weight", "t2u_model.model.encoder.layers.0.self_attn.out_proj.bias", "t2u_model.model.encoder.layers.0.self_attn_layer_norm.weight", "t2u_model.model.encoder.layers.0.self_attn_layer_norm.bias", "t2u_model.model.encoder.layers.0.ffn.fc1.weight", "t2u_model.model.encoder.layers.0.ffn.fc1.bias", "t2u_model.model.encoder.layers.0.ffn.fc2.weight", "t2u_model.model.encoder.layers.0.ffn.fc2.bias", "t2u_model.model.encoder.layers.0.ffn_layer_norm.weight", "t2u_model.model.encoder.layers.0.ffn_layer_norm.bias", "t2u_model.model.encoder.layers.1.self_attn.k_proj.weight", "t2u_model.model.encoder.layers.1.self_attn.k_proj.bias", "t2u_model.model.encoder.layers.1.self_attn.v_proj.weight", "t2u_model.model.encoder.layers.1.self_attn.v_proj.bias", "t2u_model.model.encoder.layers.1.self_attn.q_proj.weight", "t2u_model.model.encoder.layers.1.self_attn.q_proj.bias", "t2u_model.model.encoder.layers.1.self_attn.out_proj.weight", "t2u_model.model.encoder.layers.1.self_attn.out_proj.bias", "t2u_model.model.encoder.layers.1.self_attn_layer_norm.weight", "t2u_model.model.encoder.layers.1.self_attn_layer_norm.bias", "t2u_model.model.encoder.layers.1.ffn.fc1.weight", "t2u_model.model.encoder.layers.1.ffn.fc1.bias", "t2u_model.model.encoder.layers.1.ffn.fc2.weight", "t2u_model.model.encoder.layers.1.ffn.fc2.bias", "t2u_model.model.encoder.layers.1.ffn_layer_norm.weight", "t2u_model.model.encoder.layers.1.ffn_layer_norm.bias", 
	...
	Unexpected key(s) in state_dict: "model_name", "model". 

Expected behavior

Expected to load the new fintuned model and then save it to a new model file.

@amyeroberts
Copy link
Collaborator

Hi @ivanhe123, thanks for opening this issue!

From the error, it looks like the keys in the state dict new_model do not match those in the model SeamlessM4TModel. You can check the expected keys in the model by doing model_seam.state_dict().keys().

Note, it's not necessary for you to download and load in a pretrained checkpoint and then load in new weights. You can initialize a new model with the same architecture and empty weights by just downloading the config:

import torch
from accelerate import init_empty_weights
from transformers import AutoConfig, SeamlessM4TModel

config = AutoConfig.from_pretrained("facebook/hf-seamless-m4t-medium")

with init_empty_weights():
    model = SeamlessM4TModel(config)

new_model = torch.load("./expt4_m4tM.pt")
model.load_state_dict(new_model)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants