The original repository contains some deprecated code as of August 2023. In this repository I corrected the example for QA (based on Wikipedia data about Summer Olympics 2020). The correction covered only the part, which illustrates, how to prepare the data for fine tuning. The rest of the repository is still to be corrected.
The OpenAI Cookbook shares example code for accomplishing common tasks with the OpenAI API.
To run these examples, you'll need an OpenAI account and API key (create a free account).
Most code examples are written in Python, though the concepts can be applied in any language.
- How to fine-tune chat models [Aug 22, 2023]
- How to evaluate abstractive summarization [Aug 16, 2023]
- Whisper prompting guide [June 27, 2023]
- Question answering using a search API and re-ranking [June 16, 2023]
- How to call functions with Chat models [June 13, 2023]
- API usage
- GPT
- Embeddings
- Text comparison examples
- How to get embeddings
- Question answering using embeddings
- Using vector databases for embeddings search
- Semantic search using embeddings
- Recommendations using embeddings
- Clustering embeddings
- Visualizing embeddings in 2D or 3D
- Embedding long texts
- Embeddings playground (streamlit app)
- Search reranking with cross-encoders
- Apps
- Fine-tuning GPT-3
- DALL-E
- Whisper
- Azure OpenAI (alternative API from Microsoft Azure)
Beyond the code examples here, you can learn about the OpenAI API from the following resources:
- Experiment with ChatGPT
- Try the API in the OpenAI Playground
- Read about the API in the OpenAI Documentation
- Get help in the OpenAI Help Center
- Discuss the API in the OpenAI Community Forum or OpenAI Discord channel
- See example prompts in the OpenAI Examples
- Stay updated with the OpenAI Blog
People are writing great tools and papers for improving outputs from GPT. Here are some cool ones we've seen:
- Guidance: A handy looking Python library from Microsoft that uses Handlebars templating to interleave generation, prompting, and logical control.
- LangChain: A popular Python/JavaScript library for chaining sequences of language model prompts.
- FLAML (A Fast Library for Automated Machine Learning & Tuning): A Python library for automating selection of models, hyperparameters, and other tunable choices.
- Chainlit: A Python library for making chatbot interfaces.
- Guardrails.ai: A Python library for validating outputs and retrying failures. Still in alpha, so expect sharp edges and bugs.
- Semantic Kernel: A Python/C# library from Microsoft that supports prompt templating, function chaining, vectorized memory, and intelligent planning.
- Prompttools: Open-source Python tools for testing and evaluating models, vector DBs, and prompts.
- Outlines: A Python library that provides a domain-specific language to simplify prompting and constrain generation.
- Promptify: A small Python library for using language models to perform NLP tasks.
- Scale Spellbook: A paid product for building, comparing, and shipping language model apps.
- PromptPerfect: A paid product for testing and improving prompts.
- Weights & Biases: A paid product for tracking model training and prompt engineering experiments.
- OpenAI Evals: An open-source library for evaluating task performance of language models and prompts.
- LlamaIndex: A Python library for augmenting LLM apps with data.
- Arthur Shield: A paid product for detecting toxicity, hallucination, prompt injection, etc.
- LMQL: A programming language for LLM interaction with support for typed prompting, control flow, constraints, and tools.
- Brex's Prompt Engineering Guide: Brex's introduction to language models and prompt engineering.
- promptingguide.ai: A prompt engineering guide that demonstrates many techniques.
- OpenAI Cookbook: Techniques to improve reliability: A slightly dated (Sep 2022) review of techniques for prompting language models.
- Lil'Log Prompt Engineering: An OpenAI researcher's review of the prompt engineering literature (as of March 2023).
- learnprompting.org: An introductory course to prompt engineering.
- Andrew Ng's DeepLearning.AI: A short course on prompt engineering for developers.
- Andrej Karpathy's Let's build GPT: A detailed dive into the machine learning underlying GPT.
- Prompt Engineering by DAIR.AI: A one-hour video on various prompt engineering techniques.
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2022): Using few-shot prompts to ask models to think step by step improves their reasoning. PaLM's score on math word problems (GSM8K) rises from 18% to 57%.
- Self-Consistency Improves Chain of Thought Reasoning in Language Models (2022): Taking votes from multiple outputs improves accuracy even more. Voting across 40 outputs raises PaLM's score on math word problems further, from 57% to 74%, and
code-davinci-002
's from 60% to 78%. - Tree of Thoughts: Deliberate Problem Solving with Large Language Models (2023): Searching over trees of step by step reasoning helps even more than voting over chains of thought. It lifts
GPT-4
's scores on creative writing and crosswords. - Language Models are Zero-Shot Reasoners (2022): Telling instruction-following models to think step by step improves their reasoning. It lifts
text-davinci-002
's score on math word problems (GSM8K) from 13% to 41%. - Large Language Models Are Human-Level Prompt Engineers (2023): Automated searching over possible prompts found a prompt that lifts scores on math word problems (GSM8K) to 43%, 2 percentage points above the human-written prompt in Language Models are Zero-Shot Reasoners.
- Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling (2023): Automated searching over possible chain-of-thought prompts improved ChatGPT's scores on a few benchmarks by 0–20 percentage points.
- Faithful Reasoning Using Large Language Models (2022): Reasoning can be improved by a system that combines: chains of thought generated by alternative selection and inference prompts, a halter model that chooses when to halt selection-inference loops, a value function to search over multiple reasoning paths, and sentence labels that help avoid hallucination.
- STaR: Bootstrapping Reasoning With Reasoning (2022): Chain of thought reasoning can be baked into models via fine-tuning. For tasks with an answer key, example chains of thoughts can be generated by language models.
- ReAct: Synergizing Reasoning and Acting in Language Models (2023): For tasks with tools or an environment, chain of thought works better you prescriptively alternate between Reasoning steps (thinking about what to do) and Acting (getting information from a tool or environment).
- Reflexion: an autonomous agent with dynamic memory and self-reflection (2023): Retrying tasks with memory of prior failures improves subsequent performance.
- Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP (2023): Models augmented with knowledge via a "retrieve-then-read" can be improved with multi-hop chains of searches.
- Improving Factuality and Reasoning in Language Models through Multiagent Debate (2023): Generating debates between a few ChatGPT agents over a few rounds improves scores on various benchmarks. Math word problem scores rise from 77% to 85%.
If there are examples or guides you'd like to see, feel free to suggest them on the issues page. We are also happy to accept high quality pull requests, as long as they fit the scope of the repo.