PicoGPT

A extremely simple toy example of a transformer-based language model.

The model and method is based on Andrew Karpathy's awsome youtube video: Let’s build GPT: from scratch, in code, spelled out.

Quick Start

Requirements:

python >= 3.7
pytorch
numpy
rich
loguru

Training a model:

python3 train.py \
	--lr=1e-3 \
	--batch-size=32 \
	--block-size=128 \   # contex block size
	--embed-size=512 \   # embedding size
	--depth=4 \          # number of transformer layers
	--num-heads=4 \      # head-size (width) of each transformer layer
	--dropout=0.1

Traning can converge on an RTX2080Ti in about 15 minutes. Run this cmd for an interactive demo

python3 chat.py

The default training dataset is Chinese classical literatures "水浒传" and "红楼梦", which can be easily changed to anything you like.

Acknowledgements

Thank you Andrew Karpathy for your excellent youtube video and the nanoGPT project.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
chat.py		chat.py
dataset.py		dataset.py
model.py		model.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PicoGPT

Quick Start

Acknowledgements

About

Releases

Packages

Languages

License

bigeagle/picoGPT

Folders and files

Latest commit

History

Repository files navigation

PicoGPT

Quick Start

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages