Skip to content

EfficientViT is a new family of vision models for efficient high-resolution vision.

License

Notifications You must be signed in to change notification settings

mit-han-lab/efficientvit

Repository files navigation

EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction (paper, poster)

News

If you are interested in getting updates, please join our mailing list here.

About EfficientViT Models

EfficientViT is a new family of ViT models for efficient high-resolution dense prediction vision tasks. The core building block of EfficientViT is a lightweight, multi-scale linear attention module that achieves global receptive field and multi-scale learning with only hardware-efficient operations, making EfficientViT TensorRT-friendly and suitable for GPU deployment.

Third-Party Implementation/Integration

Getting Started

conda create -n efficientvit python=3.10
conda activate efficientvit
conda install -c conda-forge mpi4py openmpi
pip install -r requirements.txt

EfficientViT Applications

Model Resolution COCO mAP LVIS mAP Params MACs Jetson Orin Latency (bs1) A100 Throughput (bs16) Checkpoint
EfficientViT-SAM-L0 512x512 45.7 41.8 34.8M 35G 8.2ms 762 images/s link
EfficientViT-SAM-L1 512x512 46.2 42.1 47.7M 49G 10.2ms 638 images/s link
EfficientViT-SAM-L2 512x512 46.6 42.7 61.3M 69G 12.9ms 538 images/s link
EfficientViT-SAM-XL0 1024x1024 47.5 43.9 117.0M 185G 22.5ms 278 images/s link
EfficientViT-SAM-XL1 1024x1024 47.8 44.4 203.3M 322G 37.2ms 182 images/s link

Table1: Summary of All EfficientViT-SAM Variants. COCO mAP and LVIS mAP are measured using ViTDet's predicted bounding boxes as the prompt. End-to-end Jetson Orin latency and A100 throughput are measured with TensorRT and fp16.

demo

Demo

  • GazeSAM: Combining EfficientViT-SAM with Gaze Estimation

GazeSAM demo

Contact

Han Cai: [email protected]

TODO

  • ImageNet Pretrained models
  • Segmentation Pretrained models
  • ImageNet training code
  • EfficientViT L series, designed for cloud
  • EfficientViT for segment anything
  • EfficientViT for image generation
  • EfficientViT for CLIP
  • EfficientViT for super-resolution
  • Segmentation training code

Citation

If EfficientViT is useful or relevant to your research, please kindly recognize our contributions by citing our paper:

@article{cai2022efficientvit,
  title={Efficientvit: Enhanced linear attention for high-resolution low-computation visual recognition},
  author={Cai, Han and Gan, Chuang and Han, Song},
  journal={arXiv preprint arXiv:2205.14756},
  year={2022}
}