👁️ Vision Transformer

This repository contains an implementation of the Vision Transformer (ViT) architecture built from scratch.

📄 Reference Paper

The architecture is based on the paper:

"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"
Authors: Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, et al.
Published by Google Research, Brain Team.
Paper link

🛠️ Features of This Implementation

Tokenization: Patch-based image splitting to create input tokens.
Transformer Encoder: Multi-head self-attention and feed-forward layers.
Positional Encoding: Integration of positional information into input tokens.
Classifier Head: Fully connected layer for classification tasks.
Modular Design: Clean and reusable code for core components like tokenizers, encoders, and heads.

🚀 Getting Started

Prerequisites

Ensure you have Python 3.8+ installed and install the requirements.txt using

pip install -r requirements.txt

Running the Code

Clone this repository:

git clone https://github.com/Ruhaan838/Vision-Transformer
cd Vision-Transformer

Train the ViT on a dataset
```
python train.py 
```
Evaluate the model:
```
python eval.py
```

📁 Project Structure

vision-transformer/  
├── dataset/              # Dataset Classes  
├── models/               # Vision Transformer model components  
│   ├── vit.py            # Main ViT implementation  
│   ├── attention.py      # Scripts for SelfAttention and MultiHeadAttention
│   ├── Encoder.py        # Scripts for Encoder for Vit.
│   ├── get_model.py      # retrun the full model with Config set on config.py
│   ├── embedding.py      # Scripts for patch Embedding 
├── notebooks/
    ├── vit-scratch.ipynb # Jupyter notebook for model training and more ... 
├── config.py             # Configurations of full model
├── train.py              # Training script  
├── eval.py               # Evaluation script  
├── requirements.txt      # Required libraries  
└── README.md             # Project documentation

Let me know if you’d like assistance with any part of the implementation!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

👁️ Vision Transformer

📄 Reference Paper

🛠️ Features of This Implementation

🚀 Getting Started

Prerequisites

Running the Code

📁 Project Structure

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Dataset		Dataset
model		model
LICENSE		LICENSE
README.md		README.md
config.py		config.py
eval.py		eval.py
requirements.txt		requirements.txt
train.py		train.py

License

Ruhaan838/Vision-Transformer

Folders and files

Latest commit

History

Repository files navigation

👁️ Vision Transformer

📄 Reference Paper

🛠️ Features of This Implementation

🚀 Getting Started

Prerequisites

Running the Code

📁 Project Structure

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages