#

vision-and-language

Here are 228 public repositories matching this topic...

SHTUPLUS / GITM-MR

The official implementation for the ICCV 2023 paper "Grounded Image Text Matching with Mismatched Relation Reasoning".

vision-and-language vision-and-language-pre-training vision-language-dataset vision-language-model vision-language-learning

Updated Dec 8, 2023
Python

ahmdtaha / distributed_sigmoid_loss

Unofficial implementation for Sigmoid Loss for Language Image Pre-Training

python3 pytorch unsupervised-learning vision-and-language multimodal-deep-learning self-supervised-learning vision-language contrastive-learning distributed-data-parallel vision-transformer vision-language-pretraining

Updated Sep 26, 2023
Python

camiloavil / AI-Vision-Language-Transformer-API

an API built on FastAPI for visual question answering. It's open source

api dockerfile docker-compose python3 vision-and-language fastapi huggingface-transformers

Updated Sep 8, 2023
Dockerfile

guyyariv / vLMIG

This repo contains the official PyTorch implementation of vLMIG: Improving Visual Commonsense in Language Models via Multiple Image Generation

deep-learning language-model vision-and-language multimodal-deep-learning visual-commonsense-reasoning visual-commonsense

Updated Jul 1, 2024
Python

Huntersxsx / RIS-Learning-List

Related papers about Referring Image Segmentation (RIS)

image-segmentation referring-expressions vision-and-language referring-image-segmentation

Updated Dec 26, 2023

LivXue / VCNLG

Vision-Controllable Natural Language Generation

natural-language-generation vision-and-language natual-language-processing

Updated Mar 8, 2024
Python

shufangxun / MAC

An end-to-end masked contrastive video-and-language pre-training framework

pytorch clip mae end-to-end-learning multimodal vision-and-language activitynet pretraining msrvtt contrastive-learning vision-transformer video-text-retrieval video-language didemo

Updated Dec 13, 2022

nicholasnouri / ai-resources

A comprehensive hub for updates on generative AI research, including interviews, notebooks, and additional resources.

awesome awesome-list interview-questions vision-and-language notebook-jupyter large-language-models llm llms generative-ai

Updated Apr 2, 2024

tanmaybinaykiya / CS231N-CNN-Solutions

My solutions to CS231N CNN assignments

python natural-language-processing computer-vision deep-learning pytorch cs231n-assignment vision-and-language

Updated Mar 14, 2018
Jupyter Notebook

plxmert

phiyodr / plxmert

PyTorch code for Finding in NAACL 2022 paper "Probing the Role of Positional Information in Vision-Language Models".

naacl transformers vision-and-language pre-training vision-language lxmert naacl2022 unibwm

Updated Jul 20, 2022
Python

alsudais / ImageNet_to_AWN

Arabic WordNet matches for synsets in ImageNet

natural-language-processing computer-vision wordnet arabic-nlp vision-and-language acl2020 arabic-computer-vision

Updated Mar 5, 2022

clp-research / cost-sharing-reference-game

Source code and documentation for the LREC-COLING'24 paper "Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies"

reinforcement-learning multi-agent vision-and-language

Updated Mar 27, 2024
Python

itsShnik / allForOne

PyTorch implementation of the paper: All For One: Multi-modal Multi-Task Learning

deep-learning sentiment-classification multi-task-learning visual-question-answering vision-and-language multi-modal-learning

Updated Jul 17, 2020
Python

zchoi / Vision-and-Language-Benchmark

Codebase for research of vision&language, including various multimodal task pipline (e.g., image captioning, VQA, video-text retrieval), customizable dataset (e.g., MS-COCO, ActivityNet, MSR-VTT), pre-trained model acquire (e.g., CLIP, BLIP-2)

benchmark pipeline pretrained-models vision-and-language

Updated May 4, 2023

michelecafagna26 / vl-ablation

Targeted semantic multimodal input ablation. Official implementation of the ablation method introduced in the paper: "What Vision-Language Models 'See' when they See Scenes"

semantic tools vl interpretability occlusion multimodal xai vision-and-language ablation

Updated Mar 23, 2024
Jupyter Notebook

esradonmez / VisLang-Paper-Club

Reading group for Vision and Language research

machine-learning vl multimodal-learning vision-and-language

Updated Jan 23, 2022

marialymperaiou / knowledge-enhanced-multimodal-learning

A list of research papers on knowledge-enhanced multimodal learning

knowledge-graph multi-task-learning visual-reasoning visual-dialog visual-question-answering vision-and-language multimodal-deep-learning visual-storytelling multimodal-retrieval visual-grounding visual-commonsense-reasoning vision-and-language-navigation story-visualization image-text-matching vision-language-transformer image-text-retrieval vision-and-language-pre-training conditional-image-generation knowledge-enhanced-multimodal-learning knowledge-enhanced-vision-language

Updated Dec 8, 2022

guoyang9 / ELIP

Efficient language image pre-training

efficient vision-and-language

Updated Dec 12, 2023
Python

SCZwangxiao / TSGVs-MM2023

ACM Multimedia 2023 - Temporal Sentence in Streaming Videos

streaming-video video-understanding vision-and-language temporal-action-localization video-moment-retrieval temporal-sentence-grounding

Updated Mar 17, 2024
Python

JHKim-snu / GVCCI

[IROS 2023] GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation

robotic-arm lifelong-learning vision-and-language multimodal-deep-learning robot-manipulation iros2023

Updated Apr 23, 2024
Python

Improve this page

Add a description, image, and links to the vision-and-language topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-and-language topic, visit your repo's landing page and select "manage topics."