Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add template notebook #1570

Merged
merged 2 commits into from
Oct 7, 2020
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
332 changes: 332 additions & 0 deletions template.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,332 @@
{
aaronmarkham marked this conversation as resolved.
Show resolved Hide resolved
aaronmarkham marked this conversation as resolved.
Show resolved Hide resolved
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Title\n",
"The title should be similar to the filename, but the filename should be very concise and compact, so people can read what it is when displayed in a list view in JupyterLab.\n",
"\n",
"Example title - **Amazon SageMaker Processing: pre-processing images with PyTorch using a GPU instance type**\n",
"\n",
"* Bad example filename: *amazon_sagemaker-processing-images_with_pytorch_on_GPU.ipynb* (too long & mixes case, dashes, and underscores)\n",
"* Good example filename: *processing_images_pytorch_gpu.ipynb* (succinct, all lowercase, all underscores)\n",
"\n",
"\n",
"## Overview\n",
"1. What does this notebook do?\n",
" - What will the user learn how to do?\n",
"1. Is this an end-to-end tutorial or it is a how-to (procedural) example?\n",
" - Tutorial: add conceptual information, flowcharts, images\n",
" - How to: notebook should be lean. More of a list of steps. No conceptual info, but links to resources for more info.\n",
"1. Who is the audience? \n",
" - What should the user be familiar with before running this? \n",
" - Link to other examples they should have run first.\n",
"1. How much will this cost?\n",
" - Some estimate of both time and money is recommended.\n",
" - List the instance types and other resources that are created.\n",
"\n",
"\n",
"## Prerequisites\n",
"1. Which environments does this notebook work in? Select all that apply.\n",
" - Notebook Instances: Jupyter?\n",
" - Notebook Instances: JupyterLab?\n",
" - Studio?\n",
"1. Which conda kernel is required?\n",
"1. Is there a previous notebook that is required?\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup \n",
"\n",
"### Setup Dependencies\n",
"\n",
"1. Describe any pip or conda or apt installs or setup scripts that are needed.\n",
"1. Pin sagemaker if version <2 is required.\n",
"\n",
" `!{sys.executable} -m pip install \"sagemaker>=1.14.2,<2\"`\n",
" \n",
" \n",
"1. Upgrade sagemaker if version 2 is required, but rollback upgrades to packages that might taint the user's kernel and make other notebooks break. Do this at the end of the notebook in the cleanup cell.\n",
"\n",
" ```python\n",
" # setup\n",
" import sagemaker\n",
" version = sagemaker.__version__\n",
" !{sys.executable} -m pip install 'sagemaker>=2.0.0'\n",
" ...\n",
" # cleanup\n",
" !{sys.executable} -m pip install 'sagemaker=={}'.format(version)\n",
" ```\n",
" \n",
"\n",
"1. Use flags that facilitate automatic, end-to-end running without a user prompt, so that the notebook can run in CI without any updates or special configuration."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# SageMaker Python SDK version 1.x is required\n",
"import sys\n",
"!{sys.executable} -m pip install -U \"sagemaker>=1.14.2,<2\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# SageMaker Python SDK version 2.x is required\n",
"import sagemaker\n",
"import sys\n",
"original_version = sagemaker.__version__\n",
"!{sys.executable} -m pip install 'sagemaker>=2.0.0'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Setup Python Modules\n",
"1. Import modules, set options, and activate extensions."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2019-06-16T14:44:50.874881Z",
"start_time": "2019-06-16T14:44:38.616867Z"
}
},
"outputs": [],
"source": [
"# imports\n",
"import sagemaker\n",
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"# options\n",
"pd.options.display.max_columns = 50\n",
"pd.options.display.max_rows = 30\n",
"\n",
"# visualizations\n",
"import plotly\n",
"import plotly.graph_objs as go\n",
"import plotly.offline as ply\n",
"plotly.offline.init_notebook_mode(connected=True)\n",
"\n",
"# extensions\n",
"if 'autoreload' not in get_ipython().extension_manager.loaded:\n",
" %load_ext autoreload\n",
" \n",
"%autoreload 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Parameters\n",
"1. Setup user supplied parameters like custom bucket names and roles in a separated cell and call out what their options are.\n",
"1. Use defaults, so the notebook will still run end-to-end without any user modification.\n",
"\n",
"For example, the following description & code block prompts the user to select the preferred dataset.\n",
"\n",
"~~~\n",
"\n",
"To do select a particular dataset, assign choosen_data_set below to be one of 'diabetes', 'california', or 'boston' where each name corresponds to the it's respective dataset.\n",
"\n",
"'boston' : boston house data\n",
"'california' : california house data\n",
"'diabetes' : diabetes data\n",
"\n",
"~~~\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data_sets = {'diabetes': 'load_diabetes()', 'california': 'fetch_california_housing()', 'boston' : 'load_boston()'}\n",
"\n",
"# Change choosen_data_set variable to one of the data sets above. \n",
"choosen_data_set = 'california'\n",
"assert choosen_data_set in data_sets.keys()\n",
"print(\"I selected the '{}' dataset!\".format(choosen_data_set))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Data import\n",
"1. Look for the data that was stored by a previous notebook run `%store -r variableName`\n",
"1. If that doesn't exist, look in S3 in their default bucket\n",
"1. If that doesn't exist, download it from the [SageMaker dataset bucket](https://sagemaker-sample-files.s3.amazonaws.com/) \n",
"1. If that doesn't exist, download it from origin\n",
"\n",
"For example, the following code block will pull training and validation data that was created in a previous notebook. This allows the customer to experiment with features, re-run the notebook, and not have it pull the dataset over and over."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load relevant dataframes and variables from preprocessing_tabular_data.ipynb required for this notebook\n",
"%store -r X_train\n",
"%store -r X_test\n",
"%store -r X_val\n",
"%store -r Y_train\n",
"%store -r Y_test\n",
"%store -r Y_val\n",
"%store -r choosen_data_set"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Procedure or tutorial\n",
"1. Break up processes with Markdown blocks to explain what's going on.\n",
"1. Make use of visualizations to better demonstrate each step."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Cleanup\n",
"1. If you upgraded their `sagemaker` SDK, roll it back.\n",
"1. Delete any endpoints or other resources that linger and might cost the user money.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# rollback the SageMaker Python SDK to the kernel's original version\n",
"print(\"Original version: {}\".format(original_version))\n",
"print(\"Current version: {}\".format(sagemaker.__version__))\n",
"s = 'sagemaker=={}'.format(version)\n",
"print(\"Rolling back to... {}\".format(s))\n",
"!{sys.executable} -m pip install {s}\n",
"import sagemaker\n",
"print(\"{} installed!\".format(sagemaker.__version__))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Next steps\n",
"\n",
"1. Wrap up with some conclusion or overview of what was accomplished.\n",
"1. Offer another notebook or more resources or some other call to action."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## References\n",
"1. author1, article1, journal1, year1, url1\n",
"2. author2, article2, journal2, year2, url2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_python3",
"language": "python",
"name": "conda_python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.10"
},
"pycharm": {
"stem_cell": {
"cell_type": "raw",
"metadata": {
"collapsed": false
},
"source": []
}
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}