add support for Qwen #129

cjw-d · 2024-06-03T05:33:40Z

convert

python3 scripts/convert_qwen_from_huggingface_to_tencentpretrain.py --input_model_path $Qwen_1_8B_FOLDER --output_model_path models/qwen-1_8b.bin --layers_num 24

test

python3 scripts/generate_lm.py --load_model_path models/qwen-1_8b.bin
--tokenizer qwen --vocab_path $Qwen_1_8B_FOLDER
--test_path beginning.txt --prediction_path generated_sentence.txt
--config_path models/qwen/1_8b_config.json

ydli-ai · 2024-06-03T12:06:49Z

tencentpretrain/layers/multi_headed_attn.py

        if freqs_cis is not None:
-            query, key = apply_rotary_emb(query.transpose(1,2), key.transpose(1,2), freqs_cis=freqs_cis)
+            if use_dynamic_ntk:


这里建议可以封装一下

ydli-ai · 2024-06-03T12:12:43Z

tencentpretrain/layers/transformer.py

@@ -40,8 +49,8 @@ def __init__(self, args, layer_number=None):
            lora_params = args.lora_params

        self.self_attn = MultiHeadedAttention(
-            args.hidden_size, args.heads_num, attention_head_size, local_kv_heads_num, args.dropout, has_bias=has_bias,
-            with_scale=with_scale, lora_params=lora_params, layer_number=layer_number
+            args.hidden_size, args.heads_num, attention_head_size, local_kv_heads_num, args.dropout, self.max_seq_length, has_bias=has_bias, has_attention_bias = has_attention_bias,


之前has_bias包含了attention_bias，这里重命名后是否有考虑对之前的兼容性？比如T5模型

之前has_bias包含了attention_bias，这里重命名后是否有考虑对之前的兼容性？比如T5模型

当创建q,k,v的linear_layers时，如果没有传入attention_bias，则会使用has_bias的值，应该是对之前的模型兼容。

ydli-ai · 2024-06-03T12:14:01Z

tencentpretrain/layers/transformer.py

@@ -16,6 +17,13 @@ def __init__(self, args, layer_number=None):
        self.relative_position_embedding = args.relative_position_embedding
        self.rotary_position_embedding = args.rotary_position_embedding
        self.has_residual_attention = args.has_residual_attention
+        self.use_logn_attn = args.use_logn_attn
+        self.max_seq_length = args.max_seq_length
+        self.use_dynamic_ntk = args.use_dynamic_ntk


训练不需要ntk，只有推理需要，如果只考虑训练的话这里是否有可能简化？

wmpscc · 2024-06-21T06:26:00Z

scripts/convert_qwen_from_huggingface_to_tencentpretrain.py

建议同时提供互相转换脚本
convert_qwen_from_tencentpretrain_to_huggingface.py

cjw-d added 14 commits April 12, 2024 12:23

add logN-scaling

a08a34e

add dynamic ntk

7f22312

add qwen config

7240c2d

add qwen conversion script

da9d7b4

modify conversion script

24e00f0

remove old conversion script

af5f1ce

add QwenTokenizer

e261047

modify logN-scaling

97cdb6a

fix bug

ede088f

modify logN-scaling

98508ac

modify QwenTokenizer

a80884e

Merge branch 'main' of https://github.com/Tencent/TencentPretrain

121eb3a

add qwen_special_tokens_map

6b4eaff

add qwen config

68c4ad5

ydli-ai reviewed Jun 4, 2024

View reviewed changes

refactor

c2527db

wmpscc reviewed Jun 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for Qwen #129

add support for Qwen #129

cjw-d commented Jun 3, 2024

ydli-ai Jun 3, 2024

ydli-ai Jun 3, 2024

cjw-d Jun 8, 2024

ydli-ai Jun 3, 2024

wmpscc Jun 21, 2024

add support for Qwen #129

Are you sure you want to change the base?

add support for Qwen #129

Conversation

cjw-d commented Jun 3, 2024

convert

test

ydli-ai Jun 3, 2024

Choose a reason for hiding this comment

ydli-ai Jun 3, 2024

Choose a reason for hiding this comment

cjw-d Jun 8, 2024

Choose a reason for hiding this comment

ydli-ai Jun 3, 2024

Choose a reason for hiding this comment

wmpscc Jun 21, 2024

Choose a reason for hiding this comment