DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
-
Updated
Jul 16, 2024 - Python
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Making large AI models cheaper, faster and more accessible
NAACL '24 (Demo) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
Fast and easy distributed model training examples.
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
A curated list of awesome projects and papers for distributed training or inference
飞桨大模型开发套件,提供大语言模型、跨模态大模型、生物计算大模型等领域的全流程开发工具链。
Distributed training (multi-node) of a Transformer model
Distributed training of DNNs • C++/MPI Proxies (GPT-2, GPT-3, CosmoFlow, DLRM)
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
SC23 Deep Learning at Scale Tutorial Material
Serving distributed deep learning models with model parallel swapping.
pipeDejavu: Hardware-aware Latency Predictable, Differentiable Search for Faster Config and Convergence of Distributed ML Pipeline Parallelism
Official implementation of DynPartition: Automatic Optimal Pipeline Parallelism of Dynamic Neural Networks over Heterogeneous GPU Systems for Inference Tasks
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
Description of Framework for Efficient Fused-layer Cost Estimation, Legion (2021)
Adaptive Tensor Parallelism for Foundation Models
WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.
The project is focused on parallelising pre-processing, measuring and machine learning in the cloud, as well as the evaluation and analysis of the cloud performance.
Model parallelism for NN architectures with skip connections (eg. ResNets, UNets)
Add a description, image, and links to the model-parallelism topic page so that developers can more easily learn about it.
To associate your repository with the model-parallelism topic, visit your repo's landing page and select "manage topics."