Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

单机2卡预训练LLAMA-7B报错TypeError: an integer is required (got type NoneType) #112

Open
smallYellowCat opened this issue Nov 29, 2023 · 1 comment

Comments

@smallYellowCat
Copy link

单机2卡训练报错:
Traceback (most recent call last):
File "/home/d00620160/local/project/TencentPretrain/pretrain.py", line 139, in
main()
File "/home/d00620160/local/project/TencentPretrain/pretrain.py", line 135, in main
trainer.train_and_validate(args)
File "/home/d00620160/local/project/TencentPretrain/tencentpretrain/trainer.py", line 147, in train_and_validate
worker(args.local_rank, None, args)
File "/home/d00620160/local/project/TencentPretrain/tencentpretrain/trainer.py", line 732, in worker
trainer.train(args, local_rank, global_rank, train_loader, model_for_training, optimizer, scheduler)
File "/home/d00620160/local/project/TencentPretrain/tencentpretrain/trainer.py", line 193, in train
batch = list(next(loader_iter))
File "/home/d00620160/local/project/TencentPretrain/tencentpretrain/utils/dataloader.py", line 187, in iter
yield torch.LongTensor(src),
TypeError: an integer is required (got type NoneType)

训练命令如下:

CUDA_VISIBLE_DEVICES=6,7 deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_zero3_config.json --enable_zero3 --pretrained_model_path models/llama2-7b.bin --dataset_path llama_support.pt --spm_model_path models/llama/tokenizer.model --config_path models/llama/7b_config.json --output_model_path models/llama_support_7b_dpw.bin --world_size 2 --gpu_ranks 0 1 --data_processor lm --deepspeed_checkpoint_activations --total_steps 300000 --save_checkpoint_steps 5000 --batch_size 1

这个错误的意思是数据有问题吗? 还是模型加载的有问题?

@wmpscc
Copy link
Contributor

wmpscc commented Dec 1, 2023

数据预处理时,针对LLaMA模型需要做一些修改。

  • tencentpretrain/utils/constants.py 文件第4行special_tokens_map.json 修改为 llama_special_tokens_map.json
  • 参考llama-training

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants