Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support LLaVa #119

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

support LLaVa #119

wants to merge 18 commits into from

Conversation

JINGZIjingzi
Copy link
Contributor

@JINGZIjingzi JINGZIjingzi commented Jan 9, 2024

  1. change file tencentpretrain/utils/constants.py, modify models/special_tokens_map.json to models/llama_special_tokens_map.json.
  2. data preprocess
python3 preprocess.py \
                --corpus_path corpora/llava.json \
                --dataset_path datasets/llava.pt \
                --spm_model_path tokenizer.model \
                --processes_num 4 --data_processor llava \
                --seq_length 1024

The tokenizer.model is the same as the LLM pretrained model used for training LLaVa.
corpora/llava.json is the same format as official LLaVa datasets.
2. feature align:
To use pretrained models, we need convert the models first

python3 scripts/convert_llm_in_llava.py --input_model_path $origin_pretrained_model_path \
               --output_model_path $pretrained_model_path

python3 scripts/convert_model_add_prefix.py --input_model_path $origin_vision_model_path \
              --output_model_path $vision_model_path --prefix embedding.image_text.vision_
deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.json \
                      --pretrained_model_path $pretrained_model_path \
                      --vision_model_in_VL_emb_path $vision_model_path \
                      --dataset_path datasets/llava.pt \
                      --spm_model_path tokenizer.model  \
                      --config_path models/llava/7b_config.json \
                      --output_model_path models/llava_stage1 \
                      --world_size 8 --accumulation_steps 16 --batch_size 2 \
                      --learning_rate 1e-3 --report_steps 100 \
                      --total_steps 40000 --save_checkpoint_steps 10000 \
                      --freeze_exclude_by_name vision_language.projection \
                      --freeze_parameters embedding encoder target tgt_embedding \
                      --patch_size 14 --image_height 336 --image_width 336 \
                      --image_preprocess pad normalize 

$pretrained_model_path is the path of the pretrained LLM model. $vision_model_path is the path of the pretrained vision model. world_size * accumulation_steps * batch_size is the actual total batch size.
After training, convert the model into a .bin file

python3 models/llava_stage1/zero_to_fp32.py models/llava_stage1/ models/llava_stage1.bin
  1. instruction tuning
deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.json \
                      --pretrained_model_path models/llava_stage1.bin \
                      --dataset_path datasets/llava.pt \
                      --spm_model_path tokenizer.model \
                      --config_path models/llava/7b_config.json \
                      --output_model_path models/llava_stage2 \
                      --world_size 8 --accumulation_steps 16 --batch_size 1 \
                      --learning_rate 2e-5 --report_steps 100 \
                      --total_steps 60000 --save_checkpoint_steps 10000 \
                      --patch_size 14 --image_height 336 --image_width 336 \
                      --image_preprocess pad normalize

To save GPU graphics memory, you can use ZeRO3 by changing --deepspeed_config models/deepspeed_config.json to --deepspeed_config models/deepspeed_zero3_config.json. Note: it would be slower evidently.

After training, convert the model into a .bin file

python3 models/llava_stage2/zero_to_fp32.py models/llava_stage2/ models/llava_stage2.bin
  1. infer
deepspeed scripts/generate_lm_llava_deepspeed.py \
    --deepspeed --deepspeed_config models/deepspeed_config.json \
    --load_model_path models/llava_stage2.bin \
    --spm_model_path tokenizer.model \
    --config_path models/llava/7b_config.json \
    --test_path test.json \
    --prediction_path output.txt \
    --seq_length 1024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant