CLIP Paper read

Preface

The paper is titled ‘Learning Transferable Visual Models From Natural Language Supervision’ and comes from the team at OpenAI.

You can find more information about it on the OpenAI website:

In this post I’ll briefly describe how CLIP works.

1. Structure

Figure-1 Structure (source: from paper)

2.

3.