FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme
-
Updated
Jul 14, 2024 - Cuda
FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme
[ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
A tool for debugging and assessing floating point precision and reproducibility.
Microsoft Automatic Mixed Precision Library
Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6
<케라스 창시자에게 배우는 딥러닝 2판> 도서의 코드 저장소
🎯 Accumulated Gradients for TensorFlow 2
Fast SGEMM emulation on Tensor Cores
You Only Look Once: Unified, Real-Time Object Detection
Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.
PDPU: An Open-Source Posit Dot-Product Unit for Deep Learning Applications
This is the open source version of HPL-MXP. The code performance has been verified on Frontier
BitPack is a practical tool to efficiently save ultra-low precision/mixed-precision quantized models.
High Resolution Style Transfer in PyTorch with Color Control and Mixed Precision 🎨
Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation
Experiments to accelerate GPU device for PyTorch training
A Post-Training Quantizer for the Design of Mixed Low-Precision DNNs with Dynamic Fixed-Point Representation for Efficient Hardware Acceleration on Edge Devices
基于tensorflow1.x的预训练模型调用,支持单机多卡、梯度累积,XLA加速,混合精度。可灵活训练、验证、预测。
An implementation of HPL-AI Mixed-Precision Benchmark based on hpl-2.3
Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
Add a description, image, and links to the mixed-precision topic page so that developers can more easily learn about it.
To associate your repository with the mixed-precision topic, visit your repo's landing page and select "manage topics."