Experimental HPC accelerated deep Learning research, a next-gen R&D AI project with Scala API. 🚀
-
Updated
Jul 17, 2024 - C++
Experimental HPC accelerated deep Learning research, a next-gen R&D AI project with Scala API. 🚀
autoupdate paper list
React component library for crafting user-friendly and engaging conversational experiences
A toolkit for building AI agents that use devices
Official Pytorch implementation of MICCAI 2024 paper (early accept, top 11%) Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Chat with an AI that knows everything about you. Alternative to Rewind.ai. Record your screens & mics 24/7. You own your data. Rust. Library for devs to build AI apps on top of all your life data.
Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
SDK for interacting with stability.ai APIs (e.g. stable diffusion inference)
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework Join our Community: https://discord.com/servers/agora-999382051935506503
A lightning-fast workflow builder, it supports multimodal interaction, highly customizable extensions, and is intuitive to use even without any coding knowledge.
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.
Open Scripts and pipelines from the Multimodal Imaging and Connectome Analysis Lab at the Montreal Neurological Institute
WebLINX is a benchmark for building web navigation agents with conversational capabilities
Repository contains LinkedIn posts about Generative AI knowledge sharing, learning resources and research explanations.
A web UI Project In order to learn the large language model. This project includes features such as chat, quantization, fine-tuning, prompt engineering templates, and multimodality.
Phi-3 for Mac: Locally-run Vision and Language Models for Apple Silicon
Seamlessly integrate state-of-the-art transformer models into robotics stacks
ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.
To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."