From a76b21c1c6c27f9e34631e4956f5ed605b239eb1 Mon Sep 17 00:00:00 2001 From: Dean Wampler Date: Mon, 7 Oct 2024 17:35:27 -0500 Subject: [PATCH] Now prints the "prompt" that was returned, so the user can see what it is and more easily compare the final notebook output with the input code in the prompt. Signed-off-by: Dean Wampler --- .../Auto_Documentation.ipynb | 175 +++++++++++------- 1 file changed, 104 insertions(+), 71 deletions(-) diff --git a/recipes/Auto_Documentation/Auto_Documentation.ipynb b/recipes/Auto_Documentation/Auto_Documentation.ipynb index b297da5..b7bdcc9 100644 --- a/recipes/Auto_Documentation/Auto_Documentation.ipynb +++ b/recipes/Auto_Documentation/Auto_Documentation.ipynb @@ -1,22 +1,10 @@ { - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "colab": { - "provenance": [], - "toc_visible": true - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3" - }, - "language_info": { - "name": "python" - } - }, "cells": [ { "cell_type": "markdown", + "metadata": { + "id": "Q6rko_ANX0EC" + }, "source": [ "# Auto-generating Documentation: A Long Document Summarization Approach\n", "\n", @@ -32,13 +20,13 @@ "This approach demonstrates how techniques traditionally used for summarizing long articles, reports, or books can be adapted for technical documentation tasks. It showcases the versatility of large language models in processing and synthesizing complex information, whether it's natural language or programming code.\n", "\n", "By the end of this notebook, you'll see how principles of long document summarization can be applied to streamline and enhance the software documentation process, potentially saving developers significant time and effort." - ], - "metadata": { - "id": "Q6rko_ANX0EC" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "IwS1CzAbaFzq" + }, "source": [ "## Install Dependencies\n", "\n", @@ -48,10 +36,7 @@ "- `transformers`: For tokenization and working with language models\n", "\n", "These packages will be installed using pip, Python's package installer. If you're running this notebook in a fresh environment, make sure you have pip installed and updated (if you are in Colab, this is done for you)." - ], - "metadata": { - "id": "IwS1CzAbaFzq" - } + ] }, { "cell_type": "code", @@ -66,6 +51,9 @@ }, { "cell_type": "markdown", + "metadata": { + "id": "ydrVWz7EYHh9" + }, "source": [ "## Set Replicate Token\n", "\n", @@ -78,13 +66,15 @@ "```\n", "\n", "Remember to never share your API tokens publicly or commit them to version control systems." - ], - "metadata": { - "id": "ydrVWz7EYHh9" - } + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "TSkiGBY4qo32" + }, + "outputs": [], "source": [ "import os\n", "\n", @@ -94,15 +84,13 @@ " from google.colab import userdata\n", " userdata = userdata.get(\"replicate-api-token\")\n", " os.environ['REPLICATE_API_TOKEN'] = userdata.get('REPLICATE_API_TOKEN')" - ], - "metadata": { - "id": "TSkiGBY4qo32" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "5d0sWaZ7YLHN" + }, "source": [ "## Define a function for downloading a repository\n", "\n", @@ -121,13 +109,15 @@ "3. Fetch more detailed information about the repository\n", "\n", "To create a GitHub token, go to your GitHub account settings, select \"Developer settings\", then \"Personal access tokens\". Find more information [here](https://docs.github.com/en/rest/authentication/authenticating-to-the-rest-api?apiVersion=2022-11-28)." - ], - "metadata": { - "id": "5d0sWaZ7YLHN" - } + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3JFi40LArpIa" + }, + "outputs": [], "source": [ "import requests\n", "from time import sleep\n", @@ -168,39 +158,53 @@ " sleep(0.1)\n", "\n", " return \"\\n\\n\".join(result)\n" - ], - "metadata": { - "id": "3JFi40LArpIa" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "H-06VQn1YmtU" + }, "source": [ "## Get code from `ibm-granite-community/utils`\n", "\n", "In this example, we're focusing on the `ibm-granite-community/utils` repository, specifically the `ibm_granite_community` directory. This directory contains various utility functions that we want to document.\n", "\n", "By specifying this directory, we ensure that we're only fetching the relevant code and not unnecessary files or directories. This helps to keep our input focused and reduces the likelihood of exceeding token limits in our AI model." - ], - "metadata": { - "id": "H-06VQn1YmtU" - } + ] }, { "cell_type": "code", - "source": [ - "prompt = get_github_repo_contents(\"https://github.com/ibm-granite-community/utils\", \"ibm_granite_community\")" - ], + "execution_count": null, "metadata": { "id": "k2wS6rGJsu-T" }, + "outputs": [], + "source": [ + "prompt = get_github_repo_contents(\"https://github.com/ibm-granite-community/utils\", \"ibm_granite_community\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here's the `prompt` that was returned:" + ] + }, + { + "cell_type": "code", "execution_count": null, - "outputs": [] + "metadata": {}, + "outputs": [], + "source": [ + "print(prompt)" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "HYuQmgRJY0n5" + }, "source": [ "## Count the tokens\n", "\n", @@ -213,13 +217,15 @@ "- If our input is too large, we may need to split it into smaller chunks or summarize it\n", "\n", "Understanding token count helps us optimize our prompts and ensure we're using the model efficiently." - ], - "metadata": { - "id": "HYuQmgRJY0n5" - } + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7JqmvTqbWPgl" + }, + "outputs": [], "source": [ "from transformers import AutoTokenizer\n", "\n", @@ -227,15 +233,13 @@ "tokenizer = AutoTokenizer.from_pretrained(model_path)\n", "\n", "print(f\"Your git repo load has {len(tokenizer(prompt, return_tensors='pt')['input_ids'][0])} tokens\")" - ], - "metadata": { - "id": "7JqmvTqbWPgl" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "ygNmITWQZAZ8" + }, "source": [ "### Create our prompt and call the model in Replicate\n", "\n", @@ -253,13 +257,15 @@ "- The output is streamed, allowing for real-time display of the generated documentation\n", "\n", "This step is where the magic happens - transforming our code into human-readable documentation." - ], - "metadata": { - "id": "ygNmITWQZAZ8" - } + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "yu4HeuqWqvOj" + }, + "outputs": [], "source": [ "import replicate\n", "\n", @@ -292,12 +298,39 @@ "\n", "\n", "print(\"\".join(output))\n" - ], - "metadata": { - "id": "yu4HeuqWqvOj" - }, + ] + }, + { + "cell_type": "code", "execution_count": null, - "outputs": [] + "metadata": {}, + "outputs": [], + "source": [] } - ] + ], + "metadata": { + "colab": { + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 }