Experiments to accelerate GPU device for PyTorch training
-
Updated
Dec 15, 2021 - Jupyter Notebook
Experiments to accelerate GPU device for PyTorch training
A Post-Training Quantizer for the Design of Mixed Low-Precision DNNs with Dynamic Fixed-Point Representation for Efficient Hardware Acceleration on Edge Devices
Deep learning solution for Cassava Leaf Disease Classification, a Kaggle's Research Code Competition using Tensorflow.
You Only Look Once: Unified, Real-Time Object Detection
PyTorch RNet implementation with Distributed and Mixed-Precision training support.
Hybrid-Precision Analysis on CG Solver (H.A.C.S). Merging single and double precision to generate a fast yet accurate CG solver
Fast SGEMM emulation on Tensor Cores
This repository contains notebooks showing how to perform mixed precision training in tf.keras 2.0
This is the open source version of HPL-MXP. The code performance has been verified on Frontier
Let's train CIFAR 10 Pytorch with Half-Precision!
Extremely simple and understandable GPT2 implementation with minor tweaks
[ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
PyCon SG 2019 Tutorial: Optimizing TensorFlow Performance
An implementation of HPL-AI Mixed-Precision Benchmark based on hpl-2.3
PDPU: An Open-Source Posit Dot-Product Unit for Deep Learning Applications
CMix-NN: Mixed Low-Precision CNN Library for Memory-Constrained Edge Devices
FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme
BitPack is a practical tool to efficiently save ultra-low precision/mixed-precision quantized models.
π― Accumulated Gradients for TensorFlow 2
<μΌλΌμ€ μ°½μμμκ² λ°°μ°λ λ₯λ¬λ 2ν> λμμ μ½λ μ μ₯μ
Add a description, image, and links to the mixed-precision topic page so that developers can more easily learn about it.
To associate your repository with the mixed-precision topic, visit your repo's landing page and select "manage topics."