🔥 Dataset can be found in 🤗Huggingface, which contain 219,437 image descriptions. Link to our paper: arxiv.
See detailed instructions in install.md.
- COCO: Download here train2017.
- SAM: Click here SAM (sa_000000.tar ~ sa_000024.tar).
- VG: Click here VG.
After downloading, organize the image datasets as follows in ./dataset/
:
├── coco
│ └── train2017
├── sam
└── images
├── vg
After install all the requirements, you can follow use.md to generate description on your datasets.
![image](https://private-user-images.githubusercontent.com/119802220/338402302-9562860a-96b6-4253-9305-d133161eea70.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjExNjMwMzYsIm5iZiI6MTcyMTE2MjczNiwicGF0aCI6Ii8xMTk4MDIyMjAvMzM4NDAyMzAyLTk1NjI4NjBhLTk2YjYtNDI1My05MzA1LWQxMzMxNjFlZWE3MC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzE2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcxNlQyMDQ1MzZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1iOTA4MGI5ZTRmMTBlZmNiYTU5NWJkNzM2YTZhZTQ3ODFkMDQ2Y2QyYzQ2MmVhYzY5MmZmNDg3N2Q1NGMwMjg4JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.BUwND_RovuCEYkuoVkMSWH1SxXP-bil76zBnu3qNgOA)
If you find our work useful for your research or applications, please cite using this BibTeX:
@misc{pi2024image,
title={Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions},
author={Renjie Pi and Jianshu Zhang and Jipeng Zhang and Rui Pan and Zhekai Chen and Tong Zhang},
year={2024},
eprint={2406.07502},
archivePrefix={arXiv},
primaryClass={cs.CV}
}