High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
-
Updated
Jul 15, 2024 - C++
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
Fast Inference of MoE Models with CPU-GPU Orchestration
Tool for test diferents large language models without code.
LLM chatbot example using OpenVINO with RAG (Retrieval Augmented Generation).
Add a description, image, and links to the local-inference topic page so that developers can more easily learn about it.
To associate your repository with the local-inference topic, visit your repo's landing page and select "manage topics."