Skip to content

bigeagle/picoGPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PicoGPT

A extremely simple toy example of a transformer-based language model.

The model and method is based on Andrew Karpathy's awsome youtube video: Let’s build GPT: from scratch, in code, spelled out.

Quick Start

Requirements:

python >= 3.7
pytorch
numpy
rich
loguru

Training a model:

python3 train.py \
	--lr=1e-3 \
	--batch-size=32 \
	--block-size=128 \   # contex block size
	--embed-size=512 \   # embedding size
	--depth=4 \          # number of transformer layers
	--num-heads=4 \      # head-size (width) of each transformer layer
	--dropout=0.1

Traning can converge on an RTX2080Ti in about 15 minutes. Run this cmd for an interactive demo

python3 chat.py

The default training dataset is Chinese classical literatures "水浒传" and "红楼梦", which can be easily changed to anything you like.

Acknowledgements

Thank you Andrew Karpathy for your excellent youtube video and the nanoGPT project.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages