ViT Paper Read

Preface

The paper is titled ‘An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale’.

The paper provide the first evidence of transformer encoder application for image classification.

1. Sturcture

Figure-1 Structure (source: from paper)

2.