README for Pathway Vision Transformer

PaViT is a Pathway Vision Transformer (PaViT)-based image recognition model developed by Ajibola Emmanuel Oluwaseun. The model is inspired by Google's PaLM (Pathways Language Model) and aims to demonstrate the potential of using few-shot learning techniques in image recognition tasks.

Model Performance

PaViT was trained on a 4GB RAM CPU using a dataset of 15000 Kaggle images of 15 classes, achieving a remarkable 88% accuracy with 4 self-attention heads. The model's accuracy further improved to 96% when trained with 12 self-attention heads and 12 linearly stacked linear layers. These results demonstrate the model's impressive performance and fast training speed on a CPU, despite being trained on a relatively small dataset.

Usage

The model can be used for image recognition tasks by using the trained weights provided in the repository. The code can be modified to use custom datasets, and the model's performance can be further improved by adding more self-attention heads and linear layers.

Contribution

The author believes that PaViT has the potential to outperform existing Vision Transformer models and is eager to see it continue to evolve through the contributions of developers and other contributors.

Contributions to the project are welcome and can be made through pull requests. Developers can also report issues or suggest new features for the project.

License

This project is licensed under the MIT License.

How to use:

On inference

import PaViT 
import cv2
from tensorflow.keras.models import *
image=cv2.imread(image) #Load image
image=cv2.cvtColor(image, cv2.COLOR_BGR2RGB) #Convert image to RGB
image=cv2.resize(224, 224) #Default image size
model=load_model('trained_weight.h5') #Load weight
prediction=model.predict(image) #run inference
prediction=np.argmax(prediction, axis=-1) #Show highest probability class

On Training

model=PaViT.PaViT() 
#Output means the unit default is 15 and activation 'sigmoid'
#Output_class default is None so it uses Dense layer as the output_layer
#Ouptut_class is the output layer 
p_model=model.model(output_class=None, activation='softmax', output=15) 
p_model.summary()
p_model.compile(...)
p_model.fit(...)

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
AJ_PaViT		AJ_PaViT
PaVit NB		PaVit NB
LICENSE		LICENSE
PaViT paper.pdf		PaViT paper.pdf
PaViTNBs.ipynb		PaViTNBs.ipynb
Pavit NoteBook		Pavit NoteBook
README.md		README.md
test.png		test.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README for Pathway Vision Transformer

Model Performance

Usage

Contribution

License

How to use:

Prediction on Test data

About

Releases

Packages

Languages

License

AjibolaPy/PaViT

Folders and files

Latest commit

History

Repository files navigation

README for Pathway Vision Transformer

Model Performance

Usage

Contribution

License

How to use:

Prediction on Test data

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages