Hybrid-Precision Analysis on CG Solver (H.A.C.S). Merging single and double precision to generate a fast yet accurate CG solver
-
Updated
May 29, 2020 - C++
Hybrid-Precision Analysis on CG Solver (H.A.C.S). Merging single and double precision to generate a fast yet accurate CG solver
Fast SGEMM emulation on Tensor Cores
Experiments to accelerate GPU device for PyTorch training
PyTorch RNet implementation with Distributed and Mixed-Precision training support.
A Post-Training Quantizer for the Design of Mixed Low-Precision DNNs with Dynamic Fixed-Point Representation for Efficient Hardware Acceleration on Edge Devices
You Only Look Once: Unified, Real-Time Object Detection
Deep learning solution for Cassava Leaf Disease Classification, a Kaggle's Research Code Competition using Tensorflow.
This repository contains notebooks showing how to perform mixed precision training in tf.keras 2.0
[ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme
PyCon SG 2019 Tutorial: Optimizing TensorFlow Performance
Extremely simple and understandable GPT2 implementation with minor tweaks
This is the open source version of HPL-MXP. The code performance has been verified on Frontier
An implementation of HPL-AI Mixed-Precision Benchmark based on hpl-2.3
Let's train CIFAR 10 Pytorch with Half-Precision!
PDPU: An Open-Source Posit Dot-Product Unit for Deep Learning Applications
CMix-NN: Mixed Low-Precision CNN Library for Memory-Constrained Edge Devices
🎯 Accumulated Gradients for TensorFlow 2
基于tensorflow1.x的预训练模型调用,支持单机多卡、梯度累积,XLA加速,混合精度。可灵活训练、验证、预测。
BitPack is a practical tool to efficiently save ultra-low precision/mixed-precision quantized models.
Add a description, image, and links to the mixed-precision topic page so that developers can more easily learn about it.
To associate your repository with the mixed-precision topic, visit your repo's landing page and select "manage topics."