Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Job, New TF2.7 conda and ads v2.5.6 #52

Merged
merged 16 commits into from
Jan 25, 2022
Merged
5 changes: 5 additions & 0 deletions labs/MLSummit21/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ In [Lab 3](./lab-3-python-model.md) you build, train, and evaluate a simple scik

In [Lab 4](./lab-4-model-catalog.md) we walk you through the metadata that is available in the model catalog as well as some of the key functionalities.

## (Optional) Lab 4.5: Executing a Training Job

IN [Lab 4.5](./lab-45-training-job.md) we walk you through the process of executing a [Data Science Job](https://docs.oracle.com/en-us/iaas/data-science/using/jobs-about.htm) from a notebook session. It is the same training script as in Lab 3.

## Lab 5: Deploying Your Model

In [Lab 5](./lab-5-model-deploy.md) you deploy your model as an HTTP endpoint using the Model Deployment feature of OCI Data Science. Two different approaches are shown: through the ADS library and directly in the OCI console.
Expand All @@ -53,3 +57,4 @@ In [Lab 7](./lab-7-wrap.md) we wrap up the workshop.


Enjoy the workshop :) !

Binary file modified labs/MLSummit21/images/confirm-kernel.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified labs/MLSummit21/images/copy-install-tf-env-command.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified labs/MLSummit21/images/select-tf-env.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions labs/MLSummit21/lab-0-tenancy-setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ In this first lab, you will:

Sign up [here](https://www.oracle.com/cloud/free/).

:exclamation: :exclamation: :exclamation: **If you already have a tenancy make sure that you have not exhausted the Free Trial credits**. If you have exhausted the credits or your tenancy is older than 30 days, you will only have access to "Always Free" Services. **OCI Data Science is not yet among the "Always Free" offerings.** You will have to convert your tenancy to a paid tenancy or use a different tenancy.

## **STEP 2:** Run the Data Science Stack Template

We have created a Terraform script that can be executed throught the Resource Manager Stack resource. This Terraform script creates the basic user groups, policies, dynamic groups, networking (VCN and subnets) required to create projects and notebook sessions. The Stack also allows you to optionally launch a notebook session after teh setup is completed. We recommend that you create the notebook session.
jrgauthier01 marked this conversation as resolved.
Show resolved Hide resolved
Expand Down
22 changes: 7 additions & 15 deletions labs/MLSummit21/lab-1-notebook-setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,33 +61,25 @@ In this lab you are creating a notebook session. **This step is optional if the
1. You will notice that the notebook session emits four metrics (CPU Utilization, Memory Utilization, Network Receive and Transmit Bytes) and is integrated with OCI Monitoring. In a separate lab you will learn how to trigger alarms when those metrics reach certain pre-defined thresholds.
![](./images/notebook-monitoring.png)

## **STEP 2**: Copy The Content of this Repository to Your Notebook Session

1. Download a zip archive of this repository to your laptop/local machine. Make sure that you select the **master** branch

![](./images/github-zip-repo.png)
## **STEP 2**: Clone this Repository to Your Notebook Session

1. Open your notebook session. Click on "Open".

![](./images/ns-open.png)

1. Drag and drop the zip archive in the JupyterLab file browser.

![](./images/drag-and-drop-zip-file.png)

1. Open a Terminal window.

![](./images/open-terminal.png)

1. Execute the following command in the terminal window:

```
unzip oci-data-science*.zip
git clone https://github.com/oracle/oci-data-science-ai-samples.git lab
```

1. You should see the `lab` folder in the JupyterLab file browser window on the left. The content of this lab is under:
```
/home/datascience/lab/labs/MLSummit21/
```
This command will unzip the file.

1. Open the newly created folder and navigate to this lab folder.

Alternatively for Step 2, you can use `git clone` command in the terminal window of JupyterLab to clone the content of this repo. Make sure you create private/public ssh key pairs for this and register the public key in your github user settings.

**Congratulations! You are now ready to proceed to the next lab.**
4 changes: 2 additions & 2 deletions labs/MLSummit21/lab-2-install-conda.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Before you can use a conda environment in your notebook session, you need to ins
1. In the *Launcher* tab, click **Environment Explorer**
![](./images/notebook_launcher.png)

1. In the Environment Explorer tab, select the **Data Science Conda Environment** filter button, select **CPU** architecture filter, then scroll down until you find the **TensorFlow 2.6 for CPU Python 3.7** conda. (If you see no results, use the refresh button on the right side of the filter bar of the Environment Explorer.)
1. In the Environment Explorer tab, select the **Data Science Conda Environment** filter button, select **CPU** architecture filter, then scroll down until you find the **TensorFlow 2.7 for CPU on Python 3.7** conda. (If you see no results, use the refresh button on the right side of the filter bar of the Environment Explorer.)
![](./images/select-tf-env.png)

1. Click on the caret on the right side, copy the install command
Expand All @@ -65,7 +65,7 @@ Before you can use a conda environment in your notebook session, you need to ins
1. **Paste the command** into the terminal window and hit **Return** to execute it.
The command that you previously copied is:
```
odsc conda install -s tensorflow26_p37_cpu_v1
odsc conda install -s tensorflow27_p37_cpu_v1
```

1. You will receive a prompt related to what version number you want. Press `Enter` to select the default.
Expand Down
5 changes: 2 additions & 3 deletions labs/MLSummit21/lab-3-python-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,14 +31,13 @@ In this lab you will:

A notebook has been prepared containing all the necessary Python code to explore the data, train the model, evaluate the model, and store it in the model catalog. This notebook has already been configured with a conda environment.

1. In the file browser, navigate to the directory **~/oci-data-science-ai-samples-master/labs/MLSummit21/Notebooks/**. This directory was created in Lab 1 when you unzip this repository in your notebook session.
1. In the file browser, navigate to the directory **/home/datascience/lab/labs/MLSummit21/Notebooks/**. This directory was created in Lab 1 when you unzip this repository in your notebook session.

1. Open the notebook **1-model-training.ipynb** (double-click on it). A new tab opens in the workspace on the right.

Notice in the upper right corner of the notebook tab, it displays the name of the conda environment being used by this notebook. Confirm that the name you see the slugname of the TensorFlow conda environment (`tensorflow26_p37_cpu_v1`)
Notice in the upper right corner of the notebook tab, it displays the name of the conda environment being used by this notebook. Confirm that the name you see the slugname of the TensorFlow conda environment (`tensorflow27_p37_cpu_v1`)

![](./images/confirm-kernel.png)

1. Now you will work in the notebook. Scroll through each cell and read the explanations. When you encounter a `code` cell, execute it (using **shift + enter**) and view the results. For executable cells, the ""[ ]"" changes to a "[\*]" while executing, then a number when complete "[1]". (If you run short on time, you can use the *Run* menu to run the remaining cells and the review the results.)

**You can proceed to the next lab.**
52 changes: 52 additions & 0 deletions labs/MLSummit21/lab-45-training-job.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Lab 4.5 - Executing a Training Job

## Introduction

[Data Science Jobs](https://docs.oracle.com/en-us/iaas/data-science/using/jobs-about.htm) enable custom tasks because you can apply any use case you have, such as data preparation, model training, hyperparameter tuning, batch inference, and so on.

Using jobs, you can:

* Run machine learning (ML) or data science tasks outside of your notebook sessions in JupyterLab.
* Operationalize discrete data science and machine learning tasks as reusable runnable operations.
* Automate your typical MLOps or CI/CD pipeline.
* Execute batches or workloads triggered by events or actions.
* Batch, mini batch, or distributed batch job inference.

After the steps are completed, you can automate the process of data exploration, model training, deploying and testing using jobs. A single change in the data preparation or model training, experiments with hyperparameter tunings could be run as Job and independently tested.

Jobs are two parts, a job and a job run:

### Job
A job is template that describes the task. It contains elements like the job artifact that is immutable and can't be modified after it's uploaded to a job. Also, the job contains information about the Compute shapes the job runs on, logging options, block storage, and other options. You can add environment variables or CLI arguments to jobs to be unique or similar for all your future job runs. You can override these variables and arguments in job runs.
jrgauthier01 marked this conversation as resolved.
Show resolved Hide resolved

You can edit the Compute shape in the job and between job runs. For example, if you notice that you want to execute a job run on more powerful shape, you can edit the job Compute shape, and then start a new job run.

### Job Run
A job run is the actual job processor. In each job run, you can override some of the job configuration, and most importantly the environment variables and CLI arguments. You can have the same job with several sequentially or simultaneously started job runs with different parameters. For example, you could experiment with how the same model training process performs by providing different hyperparameters.

Estimated Lab Time: 10 minutes

## Objectives
In this lab, you will:
* Use ADS to define a Data Science Job
* Execute and monitor the progress of your Job Run.

## Prerequisites

* Successful completion of Labs 0, 1, 2, and 3.

## STEP 1: Execute the notebook `1.5-(optional)-model-training-job.ipynb`

A notebook has been prepared containing all the necessary Python code to train and save the same machine learning model as in lab 3 but this time we will run the training script as a Data Science Job.

1. In the file browser, navigate to the directory **/home/datascience/lab/labs/MLSummit21/Notebooks/**. This directory was created in Lab 1 when you unzip this repository in your notebook session.

1. Open the notebook **1.5-(optional)-model-training-job.ipynb** (double-click on it). A new tab opens in the workspace on the right.

Notice in the upper right corner of the notebook tab, it displays the name of the conda environment being used by this notebook. Confirm that the name you see the slugname of the TensorFlow conda environment (`tensorflow27_p37_cpu_v1`)

![](./images/confirm-kernel.png)

1. Now you will work in the notebook. Scroll through each cell and read the explanations. When you encounter a `code` cell, execute it (using **shift + enter**) and view the results. For executable cells, the ""[ ]"" changes to a "[\*]" while executing, then a number when complete "[1]". (If you run short on time, you can use the *Run* menu to run the remaining cells and the review the results.)

**Congratulations! You are now ready to proceed to the next lab.**
2 changes: 1 addition & 1 deletion labs/MLSummit21/lab-5-model-deploy.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ In this lab you will:

## **STEP 1:** Open and Run the Notebook `2-model-deployment.ipynb`

1. In the **~/oci-data-science-ai-samples-master/labs/MLSummit21/notebooks** directory of your notebook session, open the notebook `2-model-deployment.ipynb`
1. In the **/home/datascience/lab/labs/MLSummit21/notebooks** directory of your notebook session, open the notebook `2-model-deployment.ipynb`

1. Follow the instructions in the notebook

Expand Down
65 changes: 45 additions & 20 deletions labs/MLSummit21/notebooks/1-model-training.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,16 @@
"source": [
"Let's do all of the imports necessary to get this notebook working up here.\n",
"\n",
"**NOTE: Double-check that this notebook is running in the `tensorflow26_p37_cpu_v1` conda kernel&&"
"**<font color='red'>NOTE: This notebook was run in the TensorFlow 2.7 for CPU (slug: `tensorflow27_p37_cpu_v1`) conda environment with ADS version 2.5.6. Upgrade your version of ADS (see cell below) and restart your kernel.</font>**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#!pip install oracle-ads==2.5.6"
]
},
{
Expand Down Expand Up @@ -112,6 +121,18 @@
"print(ads.__version__)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The code cell below will work in you are in the **Ashburn** region. \n",
"\n",
"The file is also available publicly at this url: \n",
"https://objectstorage.us-ashburn-1.oraclecloud.com/n/bigdatadatasciencelarge/b/hosted-ds-datasets/o/synthetic%2Forcl_attrition.csv\n",
"\n",
"You can download it and drop it in the file browser of JupyterLab."
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -122,7 +143,7 @@
"namespace = \"bigdatadatasciencelarge\"\n",
"employees = DatasetFactory.open(\n",
" \"oci://{}@{}/synthetic/orcl_attrition.csv\".format(bucket_name, namespace), \n",
" target=\"Attrition\").set_positive_class('Yes')"
" target=\"Attrition\", storage_options={'config':{},'region':'us-ashburn-1'}).set_positive_class('Yes')"
]
},
{
Expand All @@ -146,7 +167,7 @@
"metadata": {},
"outputs": [],
"source": [
"employees.show_in_notebook()"
"#employees.show_in_notebook()"
]
},
{
Expand All @@ -155,7 +176,7 @@
"metadata": {},
"outputs": [],
"source": [
"employees.show_corr()"
"#employees.show_corr()"
]
},
{
Expand Down Expand Up @@ -359,20 +380,23 @@
"from ads.common.model_artifact import ModelArtifact\n",
"from ads.common.model_export_util import prepare_generic_model\n",
"import joblib \n",
"import os\n",
"\n",
"# Path to artifact directory for my sklearn model: \n",
"sklearn_path = \"./model-artifact/\"\n",
"model_artifact_location = os.path.expanduser('./model-artifact/')\n",
"os.makedirs(model_artifact_location, exist_ok=True)\n",
"\n",
"# Creating a joblib pickle object of my random forest model: \n",
"joblib.dump(sk_model, os.path.join(model_artifact_location, \"model.joblib\"))\n",
"\n",
"# Creating the artifact template files in the directory: \n",
"sklearn_artifact = prepare_generic_model(sklearn_path, \n",
" inference_conda_env=\"oci://service-conda-packs@id19sfcrra6z/service_pack/cpu/TensorFlow 2.6 for CPU Python 3.7/1.0/tensorflow26_p37_cpu_v1\",\n",
"sklearn_artifact = prepare_generic_model(model_artifact_location, \n",
" inference_conda_env=\"oci://service-conda-packs@id19sfcrra6z/service_pack/cpu/TensorFlow 2.7 for CPU on Python 3.7/1.0/tensorflow27_p37_cpu_v1\",\n",
" force_overwrite=True,\n",
" model='model.joblib',\n",
" use_case_type='BINARY_CLASSIFICATION',\n",
" X_sample=train.X,\n",
" y_sample=train.y)\n",
"\n",
"# Creating a joblib pickle object of my random forest model: \n",
"joblib.dump(sk_model, os.path.join(sklearn_path, \"model.joblib\"))"
" y_sample=train.y)"
]
},
{
Expand All @@ -392,8 +416,8 @@
"source": [
"#setting paths for artifact files that need to be modified: \n",
"\n",
"encoder_path = os.path.join(sklearn_path, \"dataframelabelencoder.py\")\n",
"score_path = os.path.join(sklearn_path, \"score.py\")\n",
"encoder_path = os.path.join(model_artifact_location, \"dataframelabelencoder.py\")\n",
"score_path = os.path.join(model_artifact_location, \"score.py\")\n",
"!cp dataframelabelencoder.py {encoder_path}"
]
},
Expand Down Expand Up @@ -466,8 +490,8 @@
" assert model is not None, \"Model is not loaded\"\n",
" X = pd.read_json(io.StringIO(data)) if isinstance(data, str) else pd.DataFrame.from_dict(data)\n",
" preds = model.predict(X).tolist()\n",
" #logger_pred.info(preds)\n",
" #logger_feat.info(X) \n",
"# logger_pred.info(preds)\n",
"# logger_feat.info(X) \n",
" return { 'prediction': preds }"
]
},
Expand Down Expand Up @@ -498,7 +522,7 @@
"import sys \n",
"\n",
"# add the path of score.py: \n",
"sys.path.insert(0, sklearn_path)\n",
"sys.path.insert(0, model_artifact_location)\n",
"\n",
"from score import load_model, predict\n",
"\n",
Expand Down Expand Up @@ -528,7 +552,8 @@
"mc_model = sklearn_artifact.save(project_id=os.environ['PROJECT_OCID'], \n",
" compartment_id=os.environ['NB_SESSION_COMPARTMENT_OCID'], \n",
" training_id=os.environ['NB_SESSION_OCID'],\n",
" display_name=\"sklearn-employee-attrition\",\n",
" display_name=\"attrition-model\",\n",
" ignore_introspection=False,\n",
" description=\"simple sklearn model to predict employee attrition\", \n",
" training_script_path=\"1-model-training.ipynb\", \n",
" ignore_pending_changes=True)"
Expand All @@ -554,9 +579,9 @@
"metadata": {
"celltoolbar": "Raw Cell Format",
"kernelspec": {
"display_name": "Python [conda env:tensorflow26_p37_cpu_v1]",
"display_name": "Python [conda env:tensorflow27_p37_cpu_v1]",
"language": "python",
"name": "conda-env-tensorflow26_p37_cpu_v1-py"
"name": "conda-env-tensorflow27_p37_cpu_v1-py"
},
"language_info": {
"codemirror_mode": {
Expand All @@ -568,7 +593,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.10"
"version": "3.7.12"
},
"pycharm": {
"stem_cell": {
Expand Down
Loading