quantization

Here are 600 public repositories matching this topic...

quic / aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

open-source machine-learning opensource deep-neural-networks compression deep-learning pruning quantization auto-ml network-quantization network-compression

Updated Jul 16, 2024
Python

pytorch / ao

Star

Custom data types and layouts for training and inference

training layouts sparsity inference pytorch quantization mx brrr dtypes

Updated Jul 16, 2024
Python

hiyouga / LLaMA-Factory

Star

A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Updated Jul 16, 2024
Python

intel / auto-round

Star

Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"

rounding quantization awq int4 gptq neural-compressor weight-only

Updated Jul 16, 2024
Python

Xilinx / brevitas

Star

Brevitas: neural network quantization in PyTorch

fpga deep-learning pytorch neural-networks xilinx quantization hardware-acceleration qat brevitas ptq

Updated Jul 16, 2024
Python

huggingface / optimum

Star

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

training optimization intel transformers inference pytorch quantization onnx tflite onnxruntime graphcore habana

Updated Jul 16, 2024
Python

intel / neural-compressor

Star

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

sparsity pruning quantization knowledge-distillation auto-tuning int8 low-precision quantization-aware-training post-training-quantization awq int4 large-language-models gptq smoothquant sparsegpt fp4 mxformat

Updated Jul 16, 2024
Python

rmn20 / PlaneCut

Star

Color quantization/palette generation for png images

c palette quality png compression png-compression image-optimization quantization

Updated Jul 16, 2024
C

huggingface / optimum-quanto

Star

A pytorch quantization backend for optimum

pytorch quantization optimum

Updated Jul 16, 2024
Python

ModelTC / llmc

Star

This is the official PyTorch implementation of "LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models"

Updated Jul 16, 2024
Python

ambideXtrous9 / Quantization-of-Models-PTQ-and-QAT

Star

Quantization of Models : Post-Training Quantization(PTQ) and Quantize Aware Training(QAT)

keras pytorch quantization qat tflite pytorch-implementation tflite-models quantization-aware-training ptq

Updated Jul 16, 2024
Jupyter Notebook

openvinotoolkit / nncf

Star

Neural Network Compression Framework for enhanced OpenVINO™ inference

nlp sparsity compression deep-learning tensorflow transformers pytorch classification pruning object-detection quantization semantic-segmentation bert hawq onnx openvino mmdetection mixed-precision-training quantization-aware-training

Updated Jul 16, 2024
Python

AutoGPTQ / AutoGPTQ

Star

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

nlp deep-learning transformers inference pytorch transformer quantization large-language-models llms

Updated Jul 16, 2024
Python

mobiusml / hqq

Star

Official implementation of Half-Quadratic Quantization (HQQ)

machine-learning quantization llm

Updated Jul 16, 2024
Python

openvinotoolkit / training_extensions

Star

Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™

machine-learning computer-vision deep-learning pytorch semi-supervised-learning image-classification object-detection transfer-learning image-segmentation quantization action-recognition automl incremental-learning anomaly-detection hyper-parameter-optimization self-supervised-learning openvino neural-networks-compression datumaro

Updated Jul 16, 2024
Python

jiho264 / 2024-RISE-Quantization

Star

My AdaRound Code

quantization ptsq

Updated Jul 16, 2024
Jupyter Notebook

Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.