Merge pull request aws#2 from mchoi8739/e2e-fraud-detection

polish up the notebooks
aarsanjani · Feb 10, 2021 · 1143283 · 1143283
2 parents 9689894 + 38635c7
commit 1143283
Show file tree

Hide file tree

Showing 23 changed files with 1,242 additions and 449 deletions.
diff --git a/autopilot/model-explainability/explaining_customer_churn_model.ipynb b/autopilot/model-explainability/explaining_customer_churn_model.ipynb
@@ -90,7 +90,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "!python -m pip install shap"
+    "%conda install -c conda-forge shap"
    ]
   },
   {
@@ -124,7 +124,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "automl_job_name = '<your_automl_job_name_here>'\n",
+    "automl_job_name = 'your-autopilot-job-that-exists'\n",
     "automl_job = AutoML.attach(automl_job_name, sagemaker_session=session)\n",
     "\n",
     "# Endpoint name\n",
@@ -460,4 +460,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 4
-}
+}
diff --git a/aws_marketplace/README.md b/aws_marketplace/README.md
@@ -26,6 +26,7 @@ These examples show you how to use model-packages and algorithms from AWS Market
 - [Using Algorithms](using_algorithms)
 	- [Using Algorithm From AWS Marketplace](using_algorithms/amazon_demo_product) provides a detailed walkthrough on how to use Algorithm with the enhanced SageMaker Train/Transform/Hosting/Tuning APIs by choosing a canonical product listed on AWS Marketplace.
 	- [Using AutoML algorithm](using_algorithms/automl) provides a detailed walkthrough on how to use AutoML algorithm from AWS Marketplace.
+	- [Using Implicit BPR Algorithm](using_algorithms/implicit_bpr) provides a detailed walkthrough on how to build a recommender system for implicit feedback datasets to train, evaluate and host your model to perform the batch and real-time inferences.
 
 - [Using Model Packages](using_model_packages)
 	- [Using Model Packages From AWS Marketplace](using_model_packages/generic_sample_notebook) is a generic notebook which provides sample code snippets you can modify and use for performing inference on Model Packages from AWS Marketplace, using Amazon SageMaker.

diff --git a/aws_marketplace/index.rst b/aws_marketplace/index.rst
@@ -60,6 +60,15 @@ AutoML
    using_algorithms/automl/AutoML_-_Train_multiple_models_in_parallel
 
 
+ImplicitBPR
+------
+
+.. toctree::
+   :maxdepth: 0
+
+   using_algorithms/implicit_bpr/recommender_system_with_implicit_bpr
+
+
 Use AWS Marketplace model packages
 ==================================
 

diff --git a/aws_marketplace/using_algorithms/implicit_bpr/data/README.md b/aws_marketplace/using_algorithms/implicit_bpr/data/README.md
@@ -0,0 +1 @@
+This folder will be used for downloading and storing the original dataset, data for training, testing and the batch requests payload that will be used to analyze further and train our model
diff --git a/aws_marketplace/using_algorithms/implicit_bpr/images/channel_specification.png b/aws_marketplace/using_algorithms/implicit_bpr/images/channel_specification.png
diff --git a/aws_marketplace/using_algorithms/implicit_bpr/images/classicjupyter.png b/aws_marketplace/using_algorithms/implicit_bpr/images/classicjupyter.png
diff --git a/aws_marketplace/using_algorithms/implicit_bpr/images/confirm_kernel_started.png b/aws_marketplace/using_algorithms/implicit_bpr/images/confirm_kernel_started.png
diff --git a/aws_marketplace/using_algorithms/implicit_bpr/images/customer-who-bought-this.png b/aws_marketplace/using_algorithms/implicit_bpr/images/customer-who-bought-this.png
diff --git a/aws_marketplace/using_algorithms/implicit_bpr/images/run_all_cells.png b/aws_marketplace/using_algorithms/implicit_bpr/images/run_all_cells.png
diff --git a/...arketplace/using_algorithms/implicit_bpr/images/select_conda_python3_kernal.png b/...arketplace/using_algorithms/implicit_bpr/images/select_conda_python3_kernal.png
diff --git a/...marketplace/using_algorithms/implicit_bpr/images/select_data_science_kernel.png b/...marketplace/using_algorithms/implicit_bpr/images/select_data_science_kernel.png
diff --git a/aws_marketplace/using_algorithms/implicit_bpr/images/select_kernel.png b/aws_marketplace/using_algorithms/implicit_bpr/images/select_kernel.png
diff --git a/aws_marketplace/using_algorithms/implicit_bpr/recommender_system_with_implicit_bpr.ipynb b/aws_marketplace/using_algorithms/implicit_bpr/recommender_system_with_implicit_bpr.ipynb
diff --git a/end_to_end/0-AutoClaimFraudDetection.ipynb b/end_to_end/0-AutoClaimFraudDetection.ipynb
diff --git a/end_to_end/1-data-prep-e2e.ipynb b/end_to_end/1-data-prep-e2e.ipynb
@@ -4,62 +4,54 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## SageMaker End to End Solutions : Fraud Detection for Automobile Claims\n",
-    "\n",
-    "# Part 1 : Data Prep to Feature Store "
+    "# Part 1 : Data Preparation, Process, and Store Features"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The purpose of this notebook is to perform the Data Prep phase of the ML-lifecycle. The main Data Wrangling, ingestion and multiple transformations have been done in the SageMaker Studio DataWrangler GUI . [See Video here](#link-to-video)\n",
-    "So in this notebook we will take the .flow files that define the transformations to the raw data and apply them using a SageMaker Processing job that will apply those transformations to the raw data deposited in the S3 bucket as .csv files."
+    "<a id='all-up-overview'></a>\n",
+    "\n",
+    "## [Overview](./0-AutoClaimFraudDetection.ipynb)\n",
+    "* [Notebook 0: Overview, Architecture and Data Exploration](./0-AutoClaimFraudDetection.ipynb)\n",
+    "* **[Notebook 1: Data Preparation, Process, and Store Features](./1-data-prep-e2e.ipynb)**\n",
+    "  * **[Architecture](#arch)**\n",
+    "  * **[Getting started](#aud-getting-started)**\n",
+    "  * **[DataSets](#aud-datasets)**\n",
+    "  * **[SageMaker Feature Store](#aud-feature-store)**\n",
+    "  * **[Create train and test datasets](#aud-dataset)**\n",
+    "* [Notebook 2: Train, Check Bias, Tune, Record Lineage, and Register a Model](./2-lineage-train-assess-bias-tune-registry-e2e.ipynb)\n",
+    "* [Notebook 3: Mitigate Bias, Train New Model, Store in Registry](./3-mitigate-bias-train-model2-registry-e2e.ipynb)\n",
+    "* [Notebook 4: Deploy Model, Run Predictions](./4-deploy-run-inference-e2e.ipynb)\n",
+    "* [Notebook 5: Create and Run an End-to-End Pipeline to Deploy the Model](./5-pipeline-e2e.ipynb)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "<a id='all-up-overview'></a>\n",
+    "The purpose of this notebook is to perform the Data Prep phase of the ML life cycle. The main Data Wrangling, data ingestion, and multiple transformations will be done through the SageMaker Studio Data Wrangler GUI ([See Video here](#link-to-video)).\n",
     "\n",
-    "## [Overview](./0-AutoClaimFraudDetection.ipynb)\n",
-    "* ### [Notebook 0 : Overview, Architecture and Data Exploration](./0-AutoClaimFraudDetection.ipynb)\n",
-    "* ### [Notebook 1: Data Prep, Process, Store Features](./1-data-prep-e2e.ipynb)\n",
-    "  * #### [Architecture](#arch)\n",
-    "  * #### [Getting started](#aud-getting-started)\n",
-    "  * #### [DataSets](#aud-datasets)\n",
-    "  * #### [SageMaker Feature Store](#aud-feature-store)\n",
-    "  * #### [Create train and test datasets](#aud-dataset)\n",
-    "* ### [Notebook 2: Train, Check Bias, Tune, Record Lineage, Register Model](./2-lineage-train-assess-bias-tune-registry-e2e.ipynb)\n",
-    "  * #### Train a model using XGBoost\n",
-    "  * #### Model lineage with artifacts and associations\n",
-    "  * #### Evaluate the model for bias with Clarify\n",
-    "  * #### Deposit Model and Lineage in SageMaker Model Registry\n",
-    "* ### [Notebook 3: Mitigate Bias, Train New Model, Store in Registry](./3-mitigate-bias-train-model2-registry-e2e.ipynb)\n",
-    "  * #### Train a version 2.0 model\n",
-    "* ### [Notebook 4: Deploy Model, Run Predictions](./4-deploy-run-inference-e2e.ipynb)\n",
-    "  * #### Deploy an approved model and make prediction\n",
-    "* ### [Notebook 5 : Create and Run an end to end Pipeline to Deploy the Model]((./5-pipeline-e2e.ipynb))\n",
-    "  * #### SageMaker Pipeline\n",
-    "  * #### Cleanup"
+    "In this notebook, we will take the `.flow` files that define the transformations to the raw data. and apply them using a SageMaker Processing job that will apply those transformations to the raw data deposited in the S3 bucket as `.csv` files."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "<a id='arch'> </a>\n",
-    "### Architecture for Data Prep, Process and Store Features\n",
+    "## Architecture for Data Prep, Process and Store Features\n",
+    "[overview](#all-up-overview)\n",
+    "___\n",
     "![Data Prep and Store](./images/e2e-1-pipeline-v3b.png)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Loading stored variables\n",
-    "If you ran this notebook before, you may want to re-use the resources you aready created with AWS. Run the cell below to load any prevously created variables. You should see a print-out of the existing variables. If you don't see anything printed then it's probably the first time you are running the notebook! "
+    "### Install required and/or update third-party libraries"
    ]
   },
   {
@@ -68,15 +60,16 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "%store -r\n",
-    "%store"
+    "!python -m pip install -Uq pip\n",
+    "!python -m pip install -q awswrangler==2.2.0 imbalanced-learn==0.7.0 sagemaker==2.23.1 boto3==1.16.48"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Install required and/or update third-party libraries"
+    "### Loading stored variables\n",
+    "If you ran this notebook before, you may want to re-use the resources you aready created with AWS. Run the cell below to load any prevously created variables. You should see a print-out of the existing variables. If you don't see anything printed then it's probably the first time you are running the notebook! "
    ]
   },
   {
@@ -85,17 +78,15 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "!python -m pip install -Uq pip\n",
-    "!python -m pip install -q awswrangler==2.2.0 imbalanced-learn==0.7.0 sagemaker==2.23.1 boto3==1.16.48\n"
+    "%store -r\n",
+    "%store"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
+   "cell_type": "markdown",
    "metadata": {},
-   "outputs": [],
    "source": [
-    "!python -m pip install -q --upgrade sagemaker boto3"
+    "**<font color='red'>Important</font>: You must have run the previous sequancial notebooks to retrieve variables using the StoreMagic command.**"
    ]
   },
   {
@@ -195,8 +186,7 @@
     "\n",
     "sagemaker_session = sagemaker.session.Session(\n",
     "    boto_session=boto_session,\n",
-    "    sagemaker_client=sagemaker_boto_client)\n",
-    "\n"
+    "    sagemaker_client=sagemaker_boto_client)"
    ]
   },
   {
@@ -348,9 +338,9 @@
    "metadata": {},
    "source": [
     "<a id='aud-datasets'></a>\n",
-    "#### DataSets and Feature Types\n",
-    "\n",
-    "[overview](#all-up-overview)"
+    "## DataSets and Feature Types\n",
+    "[overview](#all-up-overview)\n",
+    "___"
    ]
   },
   {
@@ -676,15 +666,6 @@
     "print('\\nData available.')"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<pre>\n",
-    "\n",
-    "</pre>"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -787,15 +768,6 @@
     "%store test_data_uri"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<pre>\n",
-    "\n",
-    "</pre>"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -815,18 +787,13 @@
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
+   "cell_type": "markdown",
    "metadata": {},
-   "outputs": [],
-   "source": []
+   "source": [
+    "___\n",
+    "\n",
+    "### Next Notebook: [Train, Check Bias, Tune, Record Lineage, Register Model](./2-lineage-train-assess-bias-tune-registry-e2e.ipynb)"
+   ]
   },
   {
    "cell_type": "code",
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		This folder will be used for downloading and storing the original dataset, data for training, testing and the batch requests payload that will be used to analyze further and train our model