Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting gguf fp16 & bf16 to hf is not supported. #31762

Open
2 of 4 tasks
PenutChen opened this issue Jul 3, 2024 · 5 comments · May be fixed by #31783
Open
2 of 4 tasks

Converting gguf fp16 & bf16 to hf is not supported. #31762

PenutChen opened this issue Jul 3, 2024 · 5 comments · May be fixed by #31783

Comments

@PenutChen
Copy link
Contributor

PenutChen commented Jul 3, 2024

System Info

transformers==4.42.3
torch==2.3.0
numpy==1.26.4
gguf==0.6.0

Who can help?

@SunMarc

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

import os
from transformers import AutoModelForCausalLM

gguf_path = "path/to/llama3-8b.fp16.gguf"  # or bf16
model_id = os.path.dirname(gguf_path)
gguf_file = os.path.basename(gguf_path)

model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=gguf_file)

Expected behavior

Besides quantization, only F32 is implemented. FP16 and BF16 are not yet supported.

fp16 error log:

Converting and de-quantizing GGUF tensors...:   0%|                         | 0/291 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/data2/Penut/LLM-Backend/Testing.py", line 9, in <module>
    model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=gguf_file)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/Penut/.miniconda/envs/Py311/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/Penut/.miniconda/envs/Py311/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3583, in from_pretrained
    state_dict = load_gguf_checkpoint(gguf_path, return_tensors=True)["tensors"]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/Penut/.miniconda/envs/Py311/lib/python3.11/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 146, in load_gguf_checkpoint
    weights = load_dequant_gguf_tensor(shape=shape, ggml_type=tensor.tensor_type, data=tensor.data)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/Penut/.miniconda/envs/Py311/lib/python3.11/site-packages/transformers/integrations/ggml.py", line 507, in load_dequant_gguf_tensor
    raise NotImplementedError(
NotImplementedError: ggml_type 1 not implemented - please raise an issue on huggingface transformers: https://github.com/huggingface/transformers/issues/new/choose

bf16 error log:

Traceback (most recent call last):
  File "/data2/Penut/LLM-Backend/Testing.py", line 9, in <module>
    model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=gguf_file)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/Penut/.miniconda/envs/Py311/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 524, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/Penut/.miniconda/envs/Py311/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 965, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/Penut/.miniconda/envs/Py311/lib/python3.11/site-packages/transformers/configuration_utils.py", line 632, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/Penut/.miniconda/envs/Py311/lib/python3.11/site-packages/transformers/configuration_utils.py", line 719, in _get_config_dict
    config_dict = load_gguf_checkpoint(resolved_config_file, return_tensors=False)["config"]
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/Penut/.miniconda/envs/Py311/lib/python3.11/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 81, in load_gguf_checkpoint
    reader = GGUFReader(gguf_checkpoint_path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/Penut/.miniconda/envs/Py311/lib/python3.11/site-packages/gguf/gguf_reader.py", line 116, in __init__
    self._build_tensors(offs, tensors_fields)
  File "/data2/Penut/.miniconda/envs/Py311/lib/python3.11/site-packages/gguf/gguf_reader.py", line 239, in _build_tensors
    ggml_type = GGMLQuantizationType(raw_dtype[0])
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/Penut/.miniconda/envs/Py311/lib/python3.11/enum.py", line 714, in __call__
    return cls.__new__(cls, value)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/Penut/.miniconda/envs/Py311/lib/python3.11/enum.py", line 1137, in __new__
    raise ve_exc
ValueError: 30 is not a valid GGMLQuantizationType

I tried to add F16 to GGML_TYPES:

GGML_TYPES = {
    "F32": 0,
    "F16": 1,
    # ...
}

def load_dequant_gguf_tensor(shape, ggml_type, data):
    if ggml_type == GGML_TYPES["F32"]:
        values = data
    elif ggml_type == GGML_TYPES["F16"]:
        values = data
    # ...

I'm not sure if this is correct, but after converting to hf, the PPL is over 1000.

@PenutChen
Copy link
Contributor Author

PenutChen commented Jul 3, 2024

I found that the PPL issue is related to Llama3 or llama.cpp. It doesn't happen with TinyLlama. I'll create another issue to discuss if needed.

@PenutChen
Copy link
Contributor Author

PenutChen commented Jul 3, 2024

It's easy to support GGUF FP16. Since BF16 is not supported by NumPy, my current workaround is to convert BF16 to FP16 using PyTorch, but it's not ideal to rely on PyTorch at this step.

Reference: main...PenutChen:transformers:main

def load_dequant_gguf_tensor(shape, ggml_type, data):
    if ggml_type == GGML_TYPES["F32"]:
        values = data
    elif ggml_type == GGML_TYPES["F16"]:
        values = data
    elif ggml_type == GGML_TYPES["BF16"]:
        import torch
        data_uint8 = data.view(np.uint8)
        tensor_uint8 = torch.from_numpy(data_uint8)
        values = tensor_uint8.view(torch.bfloat16).float().numpy()

Note that BF16 support requires modifying some code in gguf-py. Since the latest version of gguf-py from the llama.cpp repo doesn't work with the current HF integration (#31725), I modified the version from PyPI as follows:

class GGMLQuantizationType(IntEnum):
    F32  = 0
    F16  = 1
    BF16 = 30
    # ...

GGML_QUANT_SIZES = {
    GGMLQuantizationType.F32:  (1, 4),
    GGMLQuantizationType.F16:  (1, 2),
    GGMLQuantizationType.BF16: (1, 2),
    # ...
}

@LysandreJik
Copy link
Member

Hey @SunMarc, would you have some bandwidth to take a look at this ? :)

@SunMarc
Copy link
Member

SunMarc commented Jul 3, 2024

Hey @PenutChen, thanks for your research ! I think that we should just support FP16 first since supporting BF16 would require a new gguf release + transformers gguf integration is not compatible yet. LMK what you think ! If you have some time, would you like a open a PR ? Otherwise, I will do it !

@PenutChen
Copy link
Contributor Author

PenutChen commented Jul 4, 2024

@SunMarc Sure, I will do the necessary checks and open a PR! By the way, gguf-py on PyPI has not been updated for a long time. Most developers from llama.cpp seem to use gguf-py from the source. I think if we want to improve this integration, we should discuss it with the developers of llama.cpp.

@PenutChen PenutChen linked a pull request Jul 4, 2024 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants