ViT_PaperRead
ViT Paper Read
Preface
The paper is titled ‘An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale’.
The paper provide the first evidence of transformer encoder application for image classification.
1. Sturcture

Figure-1 Structure (source: from paper)
2.
本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 Linermao's kiosk!