Operationalizing Machine Learning

This project is part of the Udacity Azure ML Nanodegree. The primary aim of the project is to Operationalize Machine Learning and put it to use. We create, deploy and consume an AutoMl model and also create, publish and consume a pipeline.

Overview

This dataset ("https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv") contains data about 32950 individuals. The data includes their age, marital status, education, housing, loans, contact etc. We first use Azure AutoMl tool on the dataset provided to find the best model based on the metrics (like Accuracy). We then deploy the model using Azure Container Instances and enable Application Insights and Authentication and then consume it and check the performance using Application Insights. Later, we create, publish and consume a pipeline using Jupyter Notebook Azure ML Studio.

Architectural Diagram

Authentication

Authentication is crucial for the continuous flow of operations. Continuous Integration and Delivery system (CI/CD) rely on uninterrupted flows. When authentication is not set properly, it requires human interaction and thus, the flow is interrupted. An ideal scenario is that the system doesn't stop waiting for a user to input a password. So whenever possible, it's good to use authentication with automation.

Authentication types

Key- based

Azure Kubernetes service enabled by default Azure Container Instances service disabled by default

Token- based

Azure Kubernetes service disabled by default Not support Azure Container Instances

Interactive

Used by local deployment and experimentation (e.g. using Jupyter notebook)

Azure AutoML

Automated machine learning, also referred to as automated ML or AutoML, is the process of automating the time consuming, iterative tasks of machine learning model development. It allows data scientists, analysts, and developers to build ML models with high scale, efficiency, and productivity all while sustaining model quality.

In this project we use AutoMl to find the model that provides the most accurate results. In this case, it was Voting Ensemble with 0.91958 accuracy.

Deploy

Deployment is about delivering a trained model into production so that it can be consumed by others. Configuring deployment settings means making choices on cluster settings and other types of interaction with a deployment. Having a good grasp on configuring production environments in Azure ML Studio and the Python SDK is the key to get robust deployments.

In this project, we deploy the best model, Voting Ensemble, with Azure Container Instances and we enable authentication.

Enable Application Insights

Application Insights collects log, performance, and error data. By automatically detecting performance anomalies and featuring powerful analytics tools, you can more easily diagnose issues and better understand how your functions are used. These tools are designed to help you continuously improve performance and usability of your functions. You can even use Application Insights during local function app project development.

In this project we run one of the starter files logs.py in order to enable Application Insights.

Consume Endpoints

Swagger is a tool that helps build, document, and consume RESTful web services like the ones you are deploying in Azure ML Studio. It further explains what types of HTTP requests that an API can consume, like POST and GET.

You can consume a deployed service via an HTTP API. An HTTP API is a URL that is exposed over the network so that interaction with a trained model can happen via HTTP requests.

Users can initiate an input request, usually via an HTTP POST request. HTTP POST is a request method that is used to submit data. The HTTP GET is another commonly used request method. HTTP GET is used to retrieve information from a URL. The allowed requests methods and the different URLs exposed by Azure create a bi-directional flow of information.

In this project we us ethe starter file endpoint.py to consume the endpoint of the deployed model. We send two input queries and we get appropriate response.

Create a Pipeline

This is the most common Python SDK class you will see when dealing with Pipelines. Aside from accepting a workspace and allowing multiple steps to be passed in, it uses a description that is useful to identify it later.

Publish pipelines

Publishing a pipeline is the process of making a pipeline publicly available. You can publish pipelines in Azure Machine Learning Studio, but you can also do this with the Python SDK.

When a Pipeline is published, a public HTTP endpoint becomes available, allowing other services, including external ones, to interact with an Azure Pipeline.

Pipelines can perform several other tasks aside from training a model. Some of these tasks, or steps are:

Data Preparation
Validation
Deployment
Combined tasks

Key Steps

Dataset

AutoMl-Run

AutoMl-Run The best model

Voting Ensemble- The best model is deployed using Azure Container Instances and Authentication is Enabled

Best Model Explanation

Best Model Explanation- Aggregate plots

Explanation- Feature Importance

Explanation- Datapoints

Deploy

Model Deployed- Basic Consumption Info

logs.py execution

We use logs.py to Enable Application Insights

Applications Insight Enabled

Application Insights

Directory Listing

Running swagger.sh and serve.py

endpoint.py and benchmark.sh Execution

Performance Tracking

Swagger- localHost

Swagger Input Description

Swagger Responses

Pipeline RunDetails Widget

Pipeline Publishing

Published Pipeline

Endpoint of the Pipeline

Pipeline Runs

Pipeline Endpoints

Jupyter Notebook

Screen Recording

Link to the screencast: https://youtu.be/X4hyRzPFG3Y

The screencast highlights the significant aspects of the project. It starts with diplaying the dataset in the datasets tab. Then, we check the AutoML Experiments tab and find the AutoMl run. We find the best model to be VotingEnsemble and check the explanation provided and the conclusions drawn from the experiment. We then move to the models section to find the deployed model 'bank-marketing-model-deploy'. We check thta the application insights in enabled and visit the link. We then run endpoint.py, benchmark.py and serve.py in the terminal. We launch the localhost to check the swagger documentation of our model. We then move to the pipelines section. The pipeline runs and endpoints are checked and we run through the jupyter notebook provided.

Future Work

Working on this project has been a rewarding experience. I look forward to working with Azure and checking even more of the salient features that it has to offer. I will explore different techniques, like regularization, cross validation, data cleaning as well as using only some of the features, and check how I can deploy the best possible model for each case. Furthermore, I will study the exceptions in detail so as to publish a service that works and shows appropriate results for every query.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
Exercise_starter_files		Exercise_starter_files
starterfile1		starterfile1
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Operationalizing Machine Learning

Overview

Architectural Diagram

Key Steps

Screen Recording

Future Work

About

Releases

Packages

Languages

MonishkaDas/Operationalizing-Machine-Learning

Folders and files

Latest commit

History

Repository files navigation

Operationalizing Machine Learning

Overview

Architectural Diagram

Key Steps

Screen Recording

Future Work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages