llm-serving

Here are 58 public repositories matching this topic...

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda inference pytorch transformer llama gpt rocm model-serving tpu mlops xpu llm inferentia llmops llm-serving trainium

Updated Jul 17, 2024
Python

ray-project / ray

Star

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Updated Jul 17, 2024
Python

Multi-node production AI stack. Run the best of open source AI easily on your own servers. Create your own AI by fine-tuning open source models. Integrate LLMs with APIs. Run gptscript securely on the server

Updated Jul 16, 2024
Go

predibase / lorax

Star

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

transformers pytorch llama gpt lora model-serving fine-tuning llm llmops llm-serving llm-inference

Updated Jul 17, 2024
Python

superduper-io / superduper

Star

🔮 SuperDuper: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.

Updated Jul 16, 2024
Python

bentoml / OpenLLM

Star

Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.

Updated Jul 16, 2024
Python

bentoml / BentoML

Star

The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Jul 16, 2024
Python

skypilot-org / skypilot

Star

SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.

Updated Jul 16, 2024
Python

alibaba / rtp-llm

Star

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

inference llama gpt model-serving llm llmops llm-serving

Updated Jul 16, 2024
C++

AntonioGr7 / pratical-llms

Star

A collection of hand on notebook for LLMs practitioner

quantization llm llm-serving genai llm-training llm-inference llm-evaluation

Updated Jul 15, 2024
Jupyter Notebook

EmbeddedLLM / embeddedllm

Star

EmbeddedLLM: API server for Embedded Device Deployment. Currently support IpexLLM/DirectML./CPU

windows cpu llama gemma mistral directx-12 npu aipc directml llm model-inference llm-serving llm-inference open-source-llm phi-3 ipexllm

Updated Jul 16, 2024
Python

mosecorg / mosec

Star

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

python rust machine-learning deep-learning mxnet tensorflow gpu cv pytorch tts hacktoberfest model-serving nerual-network machine-learning-platform jax mlops llm llm-serving

Updated Jul 13, 2024
Python

microsoft / aici

Star

AICI: Prompts as (Wasm) Programs

rust ai wasm inference transformer language-model model-serving wasmtime llm llmops llm-serving llm-inference llm-framework

Updated Jul 11, 2024
Rust

liguodongiot / llm-action

Star

本项目旨在分享大模型相关技术原理以及实战经验。

llm llmops llm-serving llm-training llm-inference

Updated Jul 11, 2024
HTML

HPMLL / BurstGPT

Star

A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems

dataset mlsys llm llm-serving

Updated Jul 9, 2024
Python

France-Travail / happy_vllm

Star

A REST API for vLLM, production ready

production api-rest llm llm-serving vllm

Updated Jul 16, 2024
Python

galeselee / Awesome_LLM_System-PaperList

Star

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!

system papers paperlist llm-serving llm-inference