Get your copy today and please leave a rating/review to tell me what you thought! ⭐⭐⭐⭐⭐
Welcome to the GitHub repository for the "Quick Start Guide to Large Language Models - Second Edition". This repository contains the code snippets and notebooks used in the book, demonstrating various applications and advanced techniques in working with Transformer models and large language models (LLMs). View the code for the First Edition here
notebooks
: Contains Jupyter notebooks for each chapter in the book.data
: Contains the datasets used in the notebooks.images
: Contains images and graphs used in the notebooks.
Below is a list of the notebooks included in the notebooks
directory, organized by the chapters in the book.
-
Chapter 2: Semantic Search with LLMs
02_semantic_search.ipynb
: An introduction to semantic search using OpenAI and open-source models.
-
Chapter 3: First Steps with Prompt Engineering
03_prompt_engineering.ipynb
: A guide to effective prompt engineering for instruction-aligned LLMs.
-
Chapter 4: The AI Ecosystem: Putting the Pieces Together
04_rag_retrieval.ipynb
: Building a Retrieval-Augmented Generation (RAG) pipeline.04_agent.ipynb
: Constructing an AI agent using LLMs and other tools.
-
Chapter 5: Optimizing LLMs with Customized Fine-Tuning
05_bert_app_review.ipynb
: Fine-tuning a BERT model for app review classification.05_openai_app_review_fine_tuning.ipynb
: Fine-tuning OpenAI models for app review classification.
-
Chapter 6: Advanced Prompt Engineering
06_adv_prompt_engineering.ipynb
: Advanced techniques in prompt engineering, including output validation and semantic few-shot learning.
-
Chapter 7: Customizing Embeddings and Model Architectures
07_recommendation_engine.ipynb
: Building a recommendation engine using custom fine-tuned LLMs and embeddings.
-
Chapter 9: Moving Beyond Foundation Models
09_constructing_a_vqa_system.ipynb
: Step-by-step guide to constructing a Visual Question Answering (VQA) system using GPT-2 and Vision Transformer.09_using_our_vqa.ipynb
: Using the VQA system built in the previous notebook.09_flan_t5_rl.ipynb
: Using Reinforcement Learning (RL) to improve FLAN-T5 model outputs.
-
Chapter 10: Advanced Open-Source LLM Fine-Tuning
10_SAWYER_LLAMA_SFT.ipynb
: Fine-tuning the Llama-3 model to create the SAWYER bot.10_SAWYER_Reward_Model.ipynb
: Training a reward model from human preferences for the SAWYER bot.10_SAWYER_RLF.ipynb
: Applying Reinforcement Learning from Human Feedback (RLHF) to align the SAWYER bot.10_SAWYER_USE_SAWYER.ipynb
: Using the SAWYER bot.10_anime_category_classification_model_freezing.ipynb
: Fine-tuning a BERT model for anime category classification, comparing layer freezing techniques.10_latex_gpt2.ipynb
: Fine-tuning GPT-2 to generate LaTeX formulas.10_optimizing_fine_tuning.ipynb
: Best practices for optimizing fine-tuning of transformer models.
-
Chapter 11: Moving LLMs into Production
11_distillation_example_1.ipynb
: Exploring knowledge distillation techniques for transformer models.11_distillation_example_2.ipynb
: Advanced distillation methods and applications.11_llama_quantization.ipynb
: Quantizing Llama models for efficient deployment.
-
Chapter 12: Evaluating LLMs
12_llm_calibration.ipynb
: Techniques for calibrating LLM outputs.12_llm_gen_eval.ipynb
: Methods for evaluating the generative capabilities of LLMs.12_cluster.ipynb
: Clustering techniques for analyzing LLM outputs.- Probing - There are over a dozen notebooks for Probing so I will only share a few key ones here:
To use this repository:
- Clone the repository to your local machine:
git clone https://github.com/yourusername/quick-start-llms.git
- Navigate to the notebooks directory and open the Jupyter notebook of your choice:
cd quick-start-llms
- Install the necessary libraries:
pip install -r requirements.txt
Note: Some notebooks may require specific datasets, which can be found in the data directory.
Contributions are welcome! If you have any additions, corrections, or enhancements, feel free to submit a pull request.
This repository is for educational purposes and is meant to accompany the "Quick Start Guide to Large Language Models - Second Edition" book. Please refer to the book for in-depth explanations and discussions of the topics covered in the notebooks.
- Check out Sinan's Newsletter AI Office Hours for more AI/LLM content!
- Sinan has a podcast called Practically Intelligent where he chats about the latest and greatest in AI!
- Follow the Getting Started with Data, LLMs and ChatGPT Playlist on O'Reilly for a curated list of Sinan's work!