Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add description of fp8 function support in md file #8610

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

YZW-explorer
Copy link

PR types

New features

PR changes

Docs

Description

目前支持FP8 PTQ功能,添加相关的文档介绍以及FP8 PTQ功能使用案例

Copy link

paddle-bot bot commented Jun 17, 2024

Thanks for your contribution!

Copy link

codecov bot commented Jun 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 54.42%. Comparing base (65e721e) to head (a3e3862).
Report is 5 commits behind head on develop.

Current head a3e3862 differs from pull request most recent head bbfb24a

Please upload reports for the commit bbfb24a to get more accurate results.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8610      +/-   ##
===========================================
- Coverage    55.80%   54.42%   -1.39%     
===========================================
  Files          620      632      +12     
  Lines        96642    99475    +2833     
===========================================
+ Hits         53928    54135     +207     
- Misses       42714    45340    +2626     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

llm/argument.py Outdated
@@ -177,8 +177,8 @@ class ModelArgument:
@dataclass
class QuantArgument:
quant_type: str = field(
default="a8w8",
metadata={"help": "Quantization type. Supported values: a8w8, weight_only_int4, weight_only_int8"},
default="a8w8(int)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要用()表示是int还是float,可以新增afp8wfp8代表fp8量化

llm/quant.py Outdated
if quant_args.quant_type == "a8w8":
activation = AVGObserver(quant_bits=8)
weight = weight_observer(quant_bits=8)
if quant_args.quant_type == "a8w8(int)":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以在paddlenlp里新增一个枚举类型表示量化级别

@@ -214,8 +221,8 @@ class ModelArgument:
@dataclass
class QuantArgument:
quant_type: str = field(
default="a8w8",
metadata={"help": "Quantization type. Supported values: a8w8, weight_only_int4, weight_only_int8"},
default=QuantType.WFP8AFP8,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

默认还是WINT8AINT8?先不要修改默认行为

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

收到

@@ -77,7 +77,7 @@ python run_finetune.py ./config/llama/awq_argument.json

<summary>&emsp; 量化参数(QuantArgument)</summary><div>

- `quant_type`: PTQ,QAT量化类型,默认为A8W8。支持A8W8,WINT4,WINT8:A8W8指对激活(输入)进行INT8量化,对模型权重进行INT8量化;WINT4指仅对模型权重进行INT4量化,后续使用WeightOnly进行推理;WINT8指仅对模型权重进行INT8量化,后续使用WeightOnly进行推理。
- `quant_type`: PTQ,QAT量化类型,默认为WINT8AINT8。支持WINT8AINT8,WFP8AFP8,WEIGHT_ONLY_INT4,WEIGHT_ONLY_INT8:WINT8AINT8指对激活(输入)以及模型权重进行INT8量化;WFP8AFP8指对激活(输入)以及模型权重进行FP8量化;WINT4指仅对模型权重进行INT4量化,后续使用WeightOnly进行推理;WINT8指仅对模型权重进行INT8量化,后续使用WeightOnly进行推理。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改成Wint8Aint8、Wfp8Afp8?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

收到

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants