diff --git a/recipes/Auto_Documentation/Auto_Documentation.ipynb b/recipes/Auto_Documentation/Auto_Documentation.ipynb index b297da5..b7bdcc9 100644 --- a/recipes/Auto_Documentation/Auto_Documentation.ipynb +++ b/recipes/Auto_Documentation/Auto_Documentation.ipynb @@ -1,22 +1,10 @@ { - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "colab": { - "provenance": [], - "toc_visible": true - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3" - }, - "language_info": { - "name": "python" - } - }, "cells": [ { "cell_type": "markdown", + "metadata": { + "id": "Q6rko_ANX0EC" + }, "source": [ "# Auto-generating Documentation: A Long Document Summarization Approach\n", "\n", @@ -32,13 +20,13 @@ "This approach demonstrates how techniques traditionally used for summarizing long articles, reports, or books can be adapted for technical documentation tasks. It showcases the versatility of large language models in processing and synthesizing complex information, whether it's natural language or programming code.\n", "\n", "By the end of this notebook, you'll see how principles of long document summarization can be applied to streamline and enhance the software documentation process, potentially saving developers significant time and effort." - ], - "metadata": { - "id": "Q6rko_ANX0EC" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "IwS1CzAbaFzq" + }, "source": [ "## Install Dependencies\n", "\n", @@ -48,10 +36,7 @@ "- `transformers`: For tokenization and working with language models\n", "\n", "These packages will be installed using pip, Python's package installer. If you're running this notebook in a fresh environment, make sure you have pip installed and updated (if you are in Colab, this is done for you)." - ], - "metadata": { - "id": "IwS1CzAbaFzq" - } + ] }, { "cell_type": "code", @@ -66,6 +51,9 @@ }, { "cell_type": "markdown", + "metadata": { + "id": "ydrVWz7EYHh9" + }, "source": [ "## Set Replicate Token\n", "\n", @@ -78,13 +66,15 @@ "```\n", "\n", "Remember to never share your API tokens publicly or commit them to version control systems." - ], - "metadata": { - "id": "ydrVWz7EYHh9" - } + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "TSkiGBY4qo32" + }, + "outputs": [], "source": [ "import os\n", "\n", @@ -94,15 +84,13 @@ " from google.colab import userdata\n", " userdata = userdata.get(\"replicate-api-token\")\n", " os.environ['REPLICATE_API_TOKEN'] = userdata.get('REPLICATE_API_TOKEN')" - ], - "metadata": { - "id": "TSkiGBY4qo32" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "5d0sWaZ7YLHN" + }, "source": [ "## Define a function for downloading a repository\n", "\n", @@ -121,13 +109,15 @@ "3. Fetch more detailed information about the repository\n", "\n", "To create a GitHub token, go to your GitHub account settings, select \"Developer settings\", then \"Personal access tokens\". Find more information [here](https://docs.github.com/en/rest/authentication/authenticating-to-the-rest-api?apiVersion=2022-11-28)." - ], - "metadata": { - "id": "5d0sWaZ7YLHN" - } + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3JFi40LArpIa" + }, + "outputs": [], "source": [ "import requests\n", "from time import sleep\n", @@ -168,39 +158,53 @@ " sleep(0.1)\n", "\n", " return \"\\n\\n\".join(result)\n" - ], - "metadata": { - "id": "3JFi40LArpIa" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "H-06VQn1YmtU" + }, "source": [ "## Get code from `ibm-granite-community/utils`\n", "\n", "In this example, we're focusing on the `ibm-granite-community/utils` repository, specifically the `ibm_granite_community` directory. This directory contains various utility functions that we want to document.\n", "\n", "By specifying this directory, we ensure that we're only fetching the relevant code and not unnecessary files or directories. This helps to keep our input focused and reduces the likelihood of exceeding token limits in our AI model." - ], - "metadata": { - "id": "H-06VQn1YmtU" - } + ] }, { "cell_type": "code", - "source": [ - "prompt = get_github_repo_contents(\"https://github.com/ibm-granite-community/utils\", \"ibm_granite_community\")" - ], + "execution_count": null, "metadata": { "id": "k2wS6rGJsu-T" }, + "outputs": [], + "source": [ + "prompt = get_github_repo_contents(\"https://github.com/ibm-granite-community/utils\", \"ibm_granite_community\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here's the `prompt` that was returned:" + ] + }, + { + "cell_type": "code", "execution_count": null, - "outputs": [] + "metadata": {}, + "outputs": [], + "source": [ + "print(prompt)" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "HYuQmgRJY0n5" + }, "source": [ "## Count the tokens\n", "\n", @@ -213,13 +217,15 @@ "- If our input is too large, we may need to split it into smaller chunks or summarize it\n", "\n", "Understanding token count helps us optimize our prompts and ensure we're using the model efficiently." - ], - "metadata": { - "id": "HYuQmgRJY0n5" - } + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7JqmvTqbWPgl" + }, + "outputs": [], "source": [ "from transformers import AutoTokenizer\n", "\n", @@ -227,15 +233,13 @@ "tokenizer = AutoTokenizer.from_pretrained(model_path)\n", "\n", "print(f\"Your git repo load has {len(tokenizer(prompt, return_tensors='pt')['input_ids'][0])} tokens\")" - ], - "metadata": { - "id": "7JqmvTqbWPgl" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "ygNmITWQZAZ8" + }, "source": [ "### Create our prompt and call the model in Replicate\n", "\n", @@ -253,13 +257,15 @@ "- The output is streamed, allowing for real-time display of the generated documentation\n", "\n", "This step is where the magic happens - transforming our code into human-readable documentation." - ], - "metadata": { - "id": "ygNmITWQZAZ8" - } + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "yu4HeuqWqvOj" + }, + "outputs": [], "source": [ "import replicate\n", "\n", @@ -292,12 +298,39 @@ "\n", "\n", "print(\"\".join(output))\n" - ], - "metadata": { - "id": "yu4HeuqWqvOj" - }, + ] + }, + { + "cell_type": "code", "execution_count": null, - "outputs": [] + "metadata": {}, + "outputs": [], + "source": [] } - ] + ], + "metadata": { + "colab": { + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 }