Vit

Path:/datasets/ai/vit
URL:https://huggingface.co/docs/transformers/en/model_doc/vit
Downloaded:09-19-2024
Cite:Alexey, Dosovitskiy. “An image is worth 16x16 words: Transformers for image recognition at scale.” arXiv preprint arXiv: 2010.11929 (2020)
Variant:
  • vit-base-patch16-224
  • vivit-b-16x2-kinetics400
Bibtex:
@article{alexey2020image, title={An image is worth 16x16 words: Transformers for image recognition at scale}, author={Alexey, Dosovitskiy}, journal={arXiv preprint arXiv: 2010.11929}, year={2020}}