From a57f07e81a47a2171221c5f71483689d0fcfccd5 Mon Sep 17 00:00:00 2001 From: Aaron Markham Date: Fri, 25 Sep 2020 09:08:23 -0700 Subject: [PATCH 1/2] add template notebook --- template.ipynb | 332 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 332 insertions(+) create mode 100644 template.ipynb diff --git a/template.ipynb b/template.ipynb new file mode 100644 index 0000000000..6d79d7c44f --- /dev/null +++ b/template.ipynb @@ -0,0 +1,332 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Title\n", + "The title should be similar to the filename, but the filename should be very concise and compact, so people can read what it is when displayed in a list view in JupyterLab.\n", + "\n", + "Example title - **Amazon SageMaker Processing: pre-processing images with PyTorch using a GPU instance type**\n", + "\n", + "* Bad example filename: *amazon_sagemaker-processing-images_with_pytorch_on_GPU.ipynb* (too long & mixes case, dashes, and underscores)\n", + "* Good example filename: *processing_images_pytorch_gpu.ipynb* (succinct, all lowercase, all underscores)\n", + "\n", + "\n", + "## Overview\n", + "1. What does this notebook do?\n", + " - What will the user learn how to do?\n", + "1. Is this an end-to-end tutorial or it is a how-to (procedural) example?\n", + " - Tutorial: add conceptual information, flowcharts, images\n", + " - How to: notebook should be lean. More of a list of steps. No conceptual info, but links to resources for more info.\n", + "1. Who is the audience? \n", + " - What should the user be familiar with before running this? \n", + " - Link to other examples they should have run first.\n", + "1. How much will this cost?\n", + " - Some estimate of both time and money is recommended.\n", + " - List the instance types and other resources that are created.\n", + "\n", + "\n", + "## Prerequisites\n", + "1. Which environments does this notebook work in? Select all that apply.\n", + " - Notebook Instances: Jupyter?\n", + " - Notebook Instances: JupyterLab?\n", + " - Studio?\n", + "1. Which conda kernel is required?\n", + "1. Is there a previous notebook that is required?\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup \n", + "\n", + "### Setup Dependencies\n", + "\n", + "1. Describe any pip or conda or apt installs or setup scripts that are needed.\n", + "1. Pin sagemaker if version <2 is required.\n", + "\n", + " `!{sys.executable} -m pip install \"sagemaker>=1.14.2,<2\"`\n", + " \n", + " \n", + "1. Upgrade sagemaker if version 2 is required, but rollback upgrades to packages that might taint the user's kernel and make other notebooks break. Do this at the end of the notebook in the cleanup cell.\n", + "\n", + " ```python\n", + " # setup\n", + " import sagemaker\n", + " version = sagemaker.__version__\n", + " !{sys.executable} -m pip install 'sagemaker>=2.0.0'\n", + " ...\n", + " # cleanup\n", + " !{sys.executable} -m pip install 'sagemaker=={}'.format(version)\n", + " ```\n", + " \n", + "\n", + "1. Use flags that facilitate automatic, end-to-end running without a user prompt, so that the notebook can run in CI without any updates or special configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# SageMaker Python SDK version 1.x is required\n", + "import sys\n", + "!{sys.executable} -m pip install -U \"sagemaker>=1.14.2,<2\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# SageMaker Python SDK version 2.x is required\n", + "import sagemaker\n", + "import sys\n", + "original_version = sagemaker.__version__\n", + "!{sys.executable} -m pip install 'sagemaker>=2.0.0'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Setup Python Modules\n", + "1. Import modules, set options, and activate extensions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2019-06-16T14:44:50.874881Z", + "start_time": "2019-06-16T14:44:38.616867Z" + } + }, + "outputs": [], + "source": [ + "# imports\n", + "import sagemaker\n", + "import numpy as np\n", + "import pandas as pd\n", + "\n", + "# options\n", + "pd.options.display.max_columns = 50\n", + "pd.options.display.max_rows = 30\n", + "\n", + "# visualizations\n", + "import plotly\n", + "import plotly.graph_objs as go\n", + "import plotly.offline as ply\n", + "plotly.offline.init_notebook_mode(connected=True)\n", + "\n", + "# extensions\n", + "if 'autoreload' not in get_ipython().extension_manager.loaded:\n", + " %load_ext autoreload\n", + " \n", + "%autoreload 2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Parameters\n", + "1. Setup user supplied parameters like custom bucket names and roles in a separated cell and call out what their options are.\n", + "1. Use defaults, so the notebook will still run end-to-end without any user modification.\n", + "\n", + "For example, the following description & code block prompts the user to select the preferred dataset.\n", + "\n", + "~~~\n", + "\n", + "To do select a particular dataset, assign choosen_data_set below to be one of 'diabetes', 'california', or 'boston' where each name corresponds to the it's respective dataset.\n", + "\n", + "'boston' : boston house data\n", + "'california' : california house data\n", + "'diabetes' : diabetes data\n", + "\n", + "~~~\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data_sets = {'diabetes': 'load_diabetes()', 'california': 'fetch_california_housing()', 'boston' : 'load_boston()'}\n", + "\n", + "# Change choosen_data_set variable to one of the data sets above. \n", + "choosen_data_set = 'california'\n", + "assert choosen_data_set in data_sets.keys()\n", + "print(\"I selected the '{}' dataset!\".format(choosen_data_set))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "## Data import\n", + "1. Look for the data that was stored by a previous notebook run `%store -r variableName`\n", + "1. If that doesn't exist, look in S3 in their default bucket\n", + "1. If that doesn't exist, download it from the [SageMaker dataset bucket](https://sagemaker-sample-files.s3.amazonaws.com/) \n", + "1. If that doesn't exist, download it from origin\n", + "\n", + "For example, the following code block will pull training and validation data that was created in a previous notebook. This allows the customer to experiment with features, re-run the notebook, and not have it pull the dataset over and over." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load relevant dataframes and variables from preprocessing_tabular_data.ipynb required for this notebook\n", + "%store -r X_train\n", + "%store -r X_test\n", + "%store -r X_val\n", + "%store -r Y_train\n", + "%store -r Y_test\n", + "%store -r Y_val\n", + "%store -r choosen_data_set" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Procedure or tutorial\n", + "1. Break up processes with Markdown blocks to explain what's going on.\n", + "1. Make use of visualizations to better demonstrate each step." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Cleanup\n", + "1. If you upgraded their `sagemaker` SDK, roll it back.\n", + "1. Delete any endpoints or other resources that linger and might cost the user money.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# rollback the SageMaker Python SDK to the kernel's original version\n", + "print(\"Original version: {}\".format(original_version))\n", + "print(\"Current version: {}\".format(sagemaker.__version__))\n", + "s = 'sagemaker=={}'.format(version)\n", + "print(\"Rolling back to... {}\".format(s))\n", + "!{sys.executable} -m pip install {s}\n", + "import sagemaker\n", + "print(\"{} installed!\".format(sagemaker.__version__))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Next steps\n", + "\n", + "1. Wrap up with some conclusion or overview of what was accomplished.\n", + "1. Offer another notebook or more resources or some other call to action." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## References\n", + "1. author1, article1, journal1, year1, url1\n", + "2. author2, article2, journal2, year2, url2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "conda_python3", + "language": "python", + "name": "conda_python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.10" + }, + "pycharm": { + "stem_cell": { + "cell_type": "raw", + "metadata": { + "collapsed": false + }, + "source": [] + } + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": true, + "sideBar": true, + "skip_h1_title": false, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": false, + "toc_position": {}, + "toc_section_display": true, + "toc_window_display": false + }, + "varInspector": { + "cols": { + "lenName": 16, + "lenType": 16, + "lenVar": 40 + }, + "kernels_config": { + "python": { + "delete_cmd_postfix": "", + "delete_cmd_prefix": "del ", + "library": "var_list.py", + "varRefreshCmd": "print(var_dic_list())" + }, + "r": { + "delete_cmd_postfix": ") ", + "delete_cmd_prefix": "rm(", + "library": "var_list.r", + "varRefreshCmd": "cat(var_dic_list()) " + } + }, + "types_to_exclude": [ + "module", + "function", + "builtin_function_or_method", + "instance", + "_Feature" + ], + "window_display": false + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 1ff91d58a8bca49d1cdbcbdabc73c3d0ce1aed63 Mon Sep 17 00:00:00 2001 From: Aaron Markham Date: Tue, 6 Oct 2020 17:21:13 -0700 Subject: [PATCH 2/2] resolve comments --- template.ipynb | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/template.ipynb b/template.ipynb index 6d79d7c44f..8f42b0a07c 100644 --- a/template.ipynb +++ b/template.ipynb @@ -12,6 +12,7 @@ "* Bad example filename: *amazon_sagemaker-processing-images_with_pytorch_on_GPU.ipynb* (too long & mixes case, dashes, and underscores)\n", "* Good example filename: *processing_images_pytorch_gpu.ipynb* (succinct, all lowercase, all underscores)\n", "\n", + "**IMPORTANT:** Use only one maining heading with `#`, so your next subheading is `##` or `###` and so on.\n", "\n", "## Overview\n", "1. What does this notebook do?\n", @@ -47,7 +48,7 @@ "1. Describe any pip or conda or apt installs or setup scripts that are needed.\n", "1. Pin sagemaker if version <2 is required.\n", "\n", - " `!{sys.executable} -m pip install \"sagemaker>=1.14.2,<2\"`\n", + " `%pip install \"sagemaker>=1.14.2,<2\"`\n", " \n", " \n", "1. Upgrade sagemaker if version 2 is required, but rollback upgrades to packages that might taint the user's kernel and make other notebooks break. Do this at the end of the notebook in the cleanup cell.\n", @@ -56,10 +57,10 @@ " # setup\n", " import sagemaker\n", " version = sagemaker.__version__\n", - " !{sys.executable} -m pip install 'sagemaker>=2.0.0'\n", + " %pip install 'sagemaker>=2.0.0'\n", " ...\n", " # cleanup\n", - " !{sys.executable} -m pip install 'sagemaker=={}'.format(version)\n", + " %pip install 'sagemaker=={}'.format(version)\n", " ```\n", " \n", "\n", @@ -74,7 +75,7 @@ "source": [ "# SageMaker Python SDK version 1.x is required\n", "import sys\n", - "!{sys.executable} -m pip install -U \"sagemaker>=1.14.2,<2\"" + "%pip install \"sagemaker>=1.14.2,<2\"" ] }, { @@ -87,7 +88,7 @@ "import sagemaker\n", "import sys\n", "original_version = sagemaker.__version__\n", - "!{sys.executable} -m pip install 'sagemaker>=2.0.0'" + "%pip install 'sagemaker>=2.0.0'" ] }, { @@ -225,7 +226,7 @@ "print(\"Current version: {}\".format(sagemaker.__version__))\n", "s = 'sagemaker=={}'.format(version)\n", "print(\"Rolling back to... {}\".format(s))\n", - "!{sys.executable} -m pip install {s}\n", + "%pip install {s}\n", "import sagemaker\n", "print(\"{} installed!\".format(sagemaker.__version__))" ]