- Infrastructure/Platform
- Data Engineering
- Pre-Training Stage
- Fine-Tuning Stage
- Alignment Stage
- Evaluation Stage
- Inference Stage
- Applications
I'm grateful to my employers for trusting me to lead the team that built the GPU supercompute platform/infrastructure and to co-lead the team doing LLM pre-training. This allowed me to work on large on-premise GPU compute clusters with A100s and then H100s, which is certainly a privilege. Hopefully sharing some of these notes and insights helps the community.