AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
-
Updated
Jul 16, 2024 - Python
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
Brevitas: neural network quantization in PyTorch
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Color quantization/palette generation for png images
This is the official PyTorch implementation of "LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models"
Quantization of Models : Post-Training Quantization(PTQ) and Quantize Aware Training(QAT)
Neural Network Compression Framework for enhanced OpenVINO™ inference
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Official implementation of Half-Quadratic Quantization (HQQ)
Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™
Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
On-device LLM Inference Powered by X-Bit Quantization
Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.
To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."