multimodal

Official Pytorch implementation of MICCAI 2024 paper (early accept, top 11%) Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography

breast-cancer-prediction clip mammogram rsna multimodal vision-and-language efficientnet vindr rsna-breast-cancer

Updated Jul 16, 2024
Python

NVIDIA / NeMo

Star

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation tts speech-synthesis neural-networks deeplearning speaker-recognition asr multimodal speech-translation large-language-models speaker-diariazation generative-ai

Updated Jul 17, 2024
Python

louis030195 / screen-pipe

Sponsor

Star

Chat with an AI that knows everything about you. Alternative to Rewind.ai. Record your screens & mics 24/7. You own your data. Rust. Library for devs to build AI apps on top of all your life data.

machine-learning ai computer-vision ml vision multimodal llm

Updated Jul 16, 2024
Rust

rerun-io / rerun

Star

Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.

visualization python rust computer-vision cpp robotics multimodal

Updated Jul 16, 2024
Rust

Stability-AI / stability-sdk

Star

SDK for interacting with stability.ai APIs (e.g. stable diffusion inference)

generative-art multimodal ai-art latent-diffusion stable-diffusion

Updated Jul 16, 2024
Jupyter Notebook

kyegomez / swarms

Sponsor

Star

The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework Join our Community: https://discord.com/servers/agora-999382051935506503

Updated Jul 16, 2024
Python

rte-design / ASTRA.ai

Star

A lightning-fast workflow builder, it supports multimodal interaction, highly customizable extensions, and is intuitive to use even without any coding knowledge.

nodejs python agent golang workflow typescript ai cpp chatbot realtime gemini openai voice-assistant realtime-framework multimodal gpt-4 llm nextjs14

Updated Jul 16, 2024
Go

Yangyi-Chen / Multimodal-AND-Large-Language-Models

Star

Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.

machine-learning multimodal large-language-models general-purpose-model

Updated Jul 16, 2024

notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.

ai openai gpt multimodal gpt-3 prompt-engineering stable-diffusion

Updated Jul 16, 2024
HTML

MICA-MNI / micaopen

Star

Open Scripts and pipelines from the Multimodal Imaging and Connectome Analysis Lab at the Montreal Neurological Institute

machine-learning neuroscience neuroimaging networks gradients connectomics histology multimodal multi-scale

Updated Jul 16, 2024
Jupyter Notebook

McGill-NLP / weblinx

Star

WebLINX is a benchmark for building web navigation agents with conversational capabilities

nlp agent web computer-vision navigation agents multimodal llm

Updated Jul 16, 2024
Python

ashutosh1919 / genai-posts

Sponsor

Star

Repository contains LinkedIn posts about Generative AI knowledge sharing, learning resources and research explanations.

machine-learning artificial-intelligence deeplearning multimodal llm generative-ai

Updated Jul 16, 2024

smalltong02 / keras-llm-robot

Star

A web UI Project In order to learn the large language model. This project includes features such as chat, quantization, fine-tuning, prompt engineering templates, and multimodality.

text-to-speech chatbot gemini knowledgebase speech-to-text vectorization multimodal faiss rag milvus streamlit llm code-interpreter chatgpt pgvector fastchat

Updated Jul 16, 2024
Python

JosefAlbers / Phi-3-Vision-MLX

Star

Phi-3 for Mac: Locally-run Vision and Language Models for Apple Silicon

macos api agent mac metal lora multi-agent-systems mlx vlm fine-tuning finetuning multimodal llm phi-3 phi-3-vision phi-3-mini

Updated Jul 16, 2024
Jupyter Notebook

mbodiai / embodied-agents

Star

Seamlessly integrate state-of-the-art transformer models into robotics stacks

robotics artificial-intelligence transformer agents diffusion vlm multimodal embodied embodied-agent large-language-models llm generative-ai vision-language-model embodied-agents mbodi mbodiai

Updated Jul 16, 2024
Python

modelscope / swift

Star

ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Updated Jul 16, 2024
Python

Improve this page

Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal

Here are 729 public repositories matching this topic...

binarybrainiacs / bitfusion

isLinXu / paper-list

rustic-ai / ui-components

agentsea / surfkit

batmanlab / Mammo-CLIP

NVIDIA / NeMo

louis030195 / screen-pipe

rerun-io / rerun

Stability-AI / stability-sdk

kyegomez / swarms

rte-design / ASTRA.ai

Yangyi-Chen / Multimodal-AND-Large-Language-Models

swyxio / ai-notes

MICA-MNI / micaopen

McGill-NLP / weblinx

ashutosh1919 / genai-posts

smalltong02 / keras-llm-robot

JosefAlbers / Phi-3-Vision-MLX

mbodiai / embodied-agents

modelscope / swift

Improve this page

Add this topic to your repo