This repo trains a PyTorch implementation of minGPT using PyTorch Lightning. MinGPT is a minimal version of a GPT language model as taught in Kaparthy's zero-to-hero course. This codebase is a 'playground' repository where I can practice writing (hopefully!) better deep learning code.
To install dependencies and activate the conda environment:
conda env create -f env.yml
conda activate litgpt
If developing, install pre-commit checks:
pre-commit install
To train the model (whilst in the conda environment):
litgpt fit --config configs/default.yaml
You can override and extend the config file using the CLI. Arguments like --optimizer
and --lr_scheduler
accept Torch classes. Run litgpt fit --help
or read the LightningCLI docs for all options.
We provide config files for Tensorboard and Weights & Biases monitoring. Training with the default config (as above) uses Tensorboard. You can monitor training by running:
tensorboard --log-dir=checkpoints/
To log with Weights & Biases use the default_wandb.yaml
or ddp.yaml
config files. You will need to authenticate for the first time using wandb login
.
A script for DDP training on Slurm-managed HPC is provided. Update the shell script where required, make it executable (with chmod +x scripts/slurm.sh
), and run it:
scripts/slurm.sh
This script will generate and submit a slurm job using sbatch
. Generating the script dynamically allows resource requests to be set once at the top of the file, then passed to both slurm (to allocate resources) and Lightning (to utilise them).