The official implementation for the ICCV 2023 paper "Grounded Image Text Matching with Mismatched Relation Reasoning".
-
Updated
Dec 8, 2023 - Python
The official implementation for the ICCV 2023 paper "Grounded Image Text Matching with Mismatched Relation Reasoning".
Unofficial implementation for Sigmoid Loss for Language Image Pre-Training
an API built on FastAPI for visual question answering. It's open source
This repo contains the official PyTorch implementation of vLMIG: Improving Visual Commonsense in Language Models via Multiple Image Generation
Related papers about Referring Image Segmentation (RIS)
Vision-Controllable Natural Language Generation
An end-to-end masked contrastive video-and-language pre-training framework
A comprehensive hub for updates on generative AI research, including interviews, notebooks, and additional resources.
My solutions to CS231N CNN assignments
PyTorch code for Finding in NAACL 2022 paper "Probing the Role of Positional Information in Vision-Language Models".
Arabic WordNet matches for synsets in ImageNet
Source code and documentation for the LREC-COLING'24 paper "Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies"
PyTorch implementation of the paper: All For One: Multi-modal Multi-Task Learning
Codebase for research of vision&language, including various multimodal task pipline (e.g., image captioning, VQA, video-text retrieval), customizable dataset (e.g., MS-COCO, ActivityNet, MSR-VTT), pre-trained model acquire (e.g., CLIP, BLIP-2)
Targeted semantic multimodal input ablation. Official implementation of the ablation method introduced in the paper: "What Vision-Language Models 'See' when they See Scenes"
Reading group for Vision and Language research
A list of research papers on knowledge-enhanced multimodal learning
ACM Multimedia 2023 - Temporal Sentence in Streaming Videos
[IROS 2023] GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation
Add a description, image, and links to the vision-and-language topic page so that developers can more easily learn about it.
To associate your repository with the vision-and-language topic, visit your repo's landing page and select "manage topics."