EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction (paper, poster)

News

If you are interested in getting updates, please join our mailing list here.

[2024/07/10] EfficientViT is used as the backbone in Grounding DINO 1.5 Edge for efficient open-set object detection.
[2024/07/10] EfficientViT-SAM is used in MedficientSAM, the 1st place model in CVPR 2024 Segment Anything In Medical Images On Laptop Challenge.
[2024/07/10] An FPGA-based accelerator for EfficientViT: link.
[2024/04/23] We released the training code of EfficientViT-SAM.
[2024/04/06] EfficientViT-SAM is accepted by eLVM@CVPR'24.
[2024/03/19] Online demo of EfficientViT-SAM is available: https://evitsam.hanlab.ai/.
[2024/02/07] We released EfficientViT-SAM, the first accelerated SAM model that matches/outperforms SAM-ViT-H's zero-shot performance, delivering the SOTA performance-efficiency trade-off.
[2023/11/20] EfficientViT is available in the NVIDIA Jetson Generative AI Lab.
[2023/09/12] EfficientViT is highlighted by MIT home page and MIT News.
[2023/07/18] EfficientViT is accepted by ICCV 2023.

About EfficientViT Models

EfficientViT is a new family of ViT models for efficient high-resolution dense prediction vision tasks. The core building block of EfficientViT is a lightweight, multi-scale linear attention module that achieves global receptive field and multi-scale learning with only hardware-efficient operations, making EfficientViT TensorRT-friendly and suitable for GPU deployment.

Third-Party Implementation/Integration

Getting Started

conda create -n efficientvit python=3.10
conda activate efficientvit
conda install -c conda-forge mpi4py openmpi
pip install -r requirements.txt

EfficientViT Applications

Segment Anything

Model	Resolution	COCO mAP	LVIS mAP	Params	MACs	Jetson Orin Latency (bs1)	A100 Throughput (bs16)	Checkpoint
EfficientViT-SAM-L0	512x512	45.7	41.8	34.8M	35G	8.2ms	762 images/s	link
EfficientViT-SAM-L1	512x512	46.2	42.1	47.7M	49G	10.2ms	638 images/s	link
EfficientViT-SAM-L2	512x512	46.6	42.7	61.3M	69G	12.9ms	538 images/s	link
EfficientViT-SAM-XL0	1024x1024	47.5	43.9	117.0M	185G	22.5ms	278 images/s	link
EfficientViT-SAM-XL1	1024x1024	47.8	44.4	203.3M	322G	37.2ms	182 images/s	link

Table1: Summary of All EfficientViT-SAM Variants. COCO mAP and LVIS mAP are measured using ViTDet's predicted bounding boxes as the prompt. End-to-end Jetson Orin latency and A100 throughput are measured with TensorRT and fp16.

Image Classification

Semantic Segmentation

Demo

GazeSAM: Combining EfficientViT-SAM with Gaze Estimation

Contact

Han Cai: [email protected]

TODO

ImageNet Pretrained models
Segmentation Pretrained models
ImageNet training code
EfficientViT L series, designed for cloud
EfficientViT for segment anything
EfficientViT for image generation
EfficientViT for CLIP
EfficientViT for super-resolution
Segmentation training code

Citation

If EfficientViT is useful or relevant to your research, please kindly recognize our contributions by citing our paper:

@article{cai2022efficientvit,
  title={Efficientvit: Enhanced linear attention for high-resolution low-computation visual recognition},
  author={Cai, Han and Gan, Chuang and Han, Song},
  journal={arXiv preprint arXiv:2205.14756},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
applications		applications
assets		assets
configs		configs
demo		demo
deployment		deployment
efficientvit		efficientvit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo_sam_model.py		demo_sam_model.py
demo_seg_model.py		demo_seg_model.py
eval_cls_model.py		eval_cls_model.py
eval_sam_model.py		eval_sam_model.py
eval_seg_model.py		eval_seg_model.py
onnx_export.py		onnx_export.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
sam_eval_utils.py		sam_eval_utils.py
setup.py		setup.py
slurm_run_sam.sh		slurm_run_sam.sh
tflite_export.py		tflite_export.py
train_cls_model.py		train_cls_model.py
train_sam_model.py		train_sam_model.py
train_sam_model.sh		train_sam_model.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction (paper, poster)

News

About EfficientViT Models

Third-Party Implementation/Integration

Getting Started

EfficientViT Applications

Segment Anything

Image Classification

Semantic Segmentation

Demo

Contact

TODO

Citation

About

Releases

Packages

Contributors 7

Languages

License

mit-han-lab/efficientvit

Folders and files

Latest commit

History

Repository files navigation

EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction (paper, poster)

News

About EfficientViT Models

Third-Party Implementation/Integration

Getting Started

EfficientViT Applications

Segment Anything

Image Classification

Semantic Segmentation

Demo

Contact

TODO

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages