Wilson Mar bio photo

Wilson Mar

Hello!

Email me Calendar Skype call

LinkedIn Twitter Gitter Instagram Youtube

Github Stackoverflow Pinterest

Notes for before and after getting AI-900, AI-102, and DP-100 certified, as we automate ML workflows in the Azure PaaS cloud

US (English)   Español (Spanish)   Français (French)   Deutsch (German)   Italiano   Português   Cyrillic Russian   中文 (简体) Chinese (Simplified)   日本語 Japanese   한국어 Korean

Overview

This article presents my notes toward a guided tour to introduce use of Microsoft’s Machine Learning offerings running on the Azure cloud.

TL;DR look for “PROTIP:” in this article highlight the author’s hard-won experience.You likely won’t find such information anywhere else. My contribution to the world (to you) is a less overwhelming learning sequence, one that starts with the least complex of technologies used, then more complex ones.

My Sample ML Code

PROTIP: AI-102 is heavy on questions about coding.

So samples (unlike examples) are a more complete, best-practices solution for each of the snippets. They’re better for integrating into production code.

Unlike other classes, this describes the automation I’ve created instead of you clicking through web pages (portal.azure.com).

To start with, refer to my https://github.com/wilsonmar/azure-quickly.

Among Azure Machine Learning examples is a CLI at https://github.com/Azure/azureml-examples/tree/main/cli

docs.microsoft.com/en-us/samples/azure provides sample Python Code at https://docs.microsoft.com/en-us/samples/azure/azureml-examples/azure-machine-learning-examples/

https://docs.microsoft.com/en-us/samples/azure-samples/azure-sdk-for-go-samples/azure-sdk-for-go-samples/

A complete sample app is Microsoft’ Northwinds Traders consumer ecommerce store. install But where is it used in the course?

Tim Warner’s https://github.com/timothywarner/ai100 includes Powershell scripts:

  • keyvault-soft-delete-purge.ps1
  • keyvault-storage-account.ps1
  • python-keyvault.py
  • ssh-to-aks.md - SSH into AKS cluster nodes
  • xiot-edge-windows.ps1
  • autoprice.py

ML among AI Service Providers

Microsoft has three service “Providers”:

Asset type Resource provider namespace/Entity Prefix
Azure Cognitive Services Microsoft.CognitiveServices/accounts cog-
Azure Machine Learning workspace Microsoft.MachineLearningServices/workspaces mlw-
Azure Cognitive Search Microsoft.Search/searchServices srch-

Separte from the above are Azure IoT (Edge) and Azure IoT (Edge) Services


Search (the “Bing” brand) has recently been separated out from the “Cognitive Services” to its own at https://docs.microsoft.com/en-us/azure/search, although it’s used in “Conversational AI” using an “agent” (Azure Bot Service) to participate in (natural) conversations. BTW: in 2019 Cortana decoupled from Windows 10 search.

Since October 31st, 2020, Bing Search APIs transitioned from Azure Cognitive Services Platform to Azure Marketplace. The Bing Search v7 API subscription covers several Bing Search services (Bing Image Search, Bing News Search, Bing Video Search, Bing Visual Search, and Bing Web Search),

Microsoft DOCS

Microsoft’s Azure Machine Learning documentation is at: docs.microsoft.com/en-us/azure/machine-learning/service

github.com/microsoftdocs/ml-basics

Readiness Quiz for DP-100 (AZ-900 and AI-900)


Decision service = Azure Machine Learning

By definition, “Machine Learning” involves creating programs without programmers coding logic in languages such as Python.

The work of Machine Learning (abbreviated to “ML”) is to recognize patterns in historical data to “train” a model which can be referenced by web applications and other user interfaces to make predictions from new, similar data.

In Azure, several “resources” need to be setup:

  • A Machine Learning workspace
  • A Storage account to hold the model
  • A Key Vault to hold secrets
  • An Application Insights account to hold logs and metrics
  • A source of data (database)
  • Ingestion and cleaning of data

The above can be setup by running a single command, but after you are setup to run it.

  1. First, get skill at using Azure Portal Poart and CLI Bash by following my deep but concise tutorial at

    https://wilsonmar.github.io/azure-cloud-onramp

    It covers creation of free Azure Subscription and Azure Storage accounts to hold files in a clouddrive folder.

    Create Machine Learning Workspace using GUI

  2. In portal.azure.com, press G+\ and in the Search box type enough of Machine Learning for a selection with that name to appear in the dropdown that appears so you can select it by pressing Enter.

    az-ml-search-mac-1042x286

    (Don’t select “classic”).

    “Machine learning is a subset of data science that deals with predictive modeling. In other words, using data to create models that can predict unknown values. It works by identifying relationships between data values that describe characteristics of something (its features) and the value we want to predict (the label), and encapsulating these relationships in a model through a training process.”

    • classification predicts categories or classes using unsupervised machine learning techniques to fit features into model and predict classification of the label. Labels are what we want to predict, such as a future value predicted or an action. The label is usually “Y” among mathimaticians.

    • regression predicts numeric values using supervised machine learning techniques on historical data.

    • Time Series forecasting is used for Anomaly Detection using regression with a time-series element, enabling you to predict numeric values at a future point in time.

    • Clustering identifies the nearest neighbor in multiple dimensions, such as the nearest color to an RGB color value.

  3. VIDEO: BLOG: For Worspace Edition: choose “Basic” or “Enterprise” after considering feature and pricing differences at

    azure.microsoft.com/en-us/pricing/details/machine-learning/

    Create Machine Learning Workspace using CLI

  4. Setup your CLI Bash environment by following my instructions at:

    https://github.com/wilsonmar/azure-quickly#readme

    That covers setting up folders, shortcuts, and external memory variables in the CLI environment.

  5. Invoke a run to train a sample model by running:

    ./az-mlcli2.sh

    This script I wrote to automate manual setup procedures from https://github.com/Azure/azureml-examples, described at .

    The script invoke bash setup.sh to create Resource Group “azureml-examples-rg” in “East US” containing:

    • main (Machine Learning)
    • maininsights… (Application Insights)
    • mainkeyvault… (Key vault)
    • mainstorage… (Storage account)

    My script also runs a GitHub Actions yml file using the “ml” subcommand from the Microsoft ML 2.0 CLI Preview announced May, 2021:

    time az ml job create -f jobs/hello-world-env-var.yml --web --stream

    The code at hello-world.yml, which has python print out “hello world” from within a Docker image downloaded from Docker Hub (docker.io):

    command: echo $ENV_VAR
    environment:
      docker:
     image: docker.io/python
    environment_variables:
      ENV_VAR: "hello world"
    compute:
     target: local
    

    Information about it is at: https://github.com/Azure/azureml-examples/tree/main/python-sdk/tutorials/an-introduction

  6. Run again, but use hello-world-env-var.yml

  7. Run other yml files listed in https://github.com/Azure/azureml-examples/tree/main/cli, which scripts passed or failed in GitHub Actions.

    CAUTION: Don’t run jobs marked “failing” (in red).

  8. When done, stop billing by running:

    bash cleanup.sh

    Additionally, there are more sample ML code to create models,

    Below are instructions to do the work manually in the Azure Portal:

  1. Click “Start now” under “Notebooks”.
  2. Click “Terminal”.

    Note that you are not in the Azure CLI but within a server instance (named “heavenlit” in the example below).

    az-ml-notebook-term-2060x540

  3. Change the prompt:

  4. NOTE: There is no code (Visual Studio Code) by default but install it:

    sudo snap install code --classic
  5. You can click the red icon (to stop) or blue arrow (to restart) the server.

    ml-basics Python Jupyter Notebooks

https://docs.microsoft.com/en-us/users/msftofficialcurriculum-4292/collections/kox0ig8qrgez2q ILT Pilot – DP-100: Designing and Implementing a Data Science Solution on Azure

  1. TUTORIAL: Get the script:

    
    git clone https://github.com/microsoftdocs/ml-basics.git --depth 1
    
  2. Get the Python scripts referenced during the 3-day $1795 USD live online course by Microsoft DP-100T01-A: Designing and Implementing a Data Science Solution on Azure (for Data Scientists).

    git clone https://github.com/MicrosoftLearning/mslearn-dp100.git --depth 1
    

    NOTE: You don’t need to cd into the repo because it’s called from the directory list.

  3. Toggle from “Focus Mode” to “Standard Mode” to see the directory list.
  4. Expand “ml-basics”.
  5. Double click one of the .ipynb files to open in a Jupyter Notebook at the right pane.

    From mslearn-dp100, described at microsoftlearning.github.io/mslearn-dp100:

    Responsible AI/ML:

    From ml-basics:

    • 01 - Data Exploration.ipynb

    Additionally:

    github.com/Azure/MachineLearningNotebooks has Notebooks.

  6. Click the “»” double blue icon to run the script.

    Watch each data frame and graphic get generated. The final frame’s output expected is:

    Studying for 14 hours per week may result in a grade of 70

    CONGRATULATIONS! At this point your DevOps job is done.

    Study the code

    The AI-102 and DP-100 both focus on coding.

  7. QUIZ:

    • What does NumPy.shape (2,20) tell you about the elements in the array?
      A tuple whose elements give the lengths of the corresponding array dimensions. The array is two dimensional, with each dimension having 20 elements.
  8. To better study the Python coding, clone the repo to your local machine so you can use your editor’s Find features.

  9. Make a change and run again to see the impact.

    PROTIP: The “Ensemble Algorithm” is the current state of the art yielding the best ROC.

    • https://madewithml.com/

    Flights Challenge

  10. Near the bottom of the file is this Challenge:

    “If this notebook has inspired you to try exploring data for yourself, why not take on the challenge of a real-world dataset containing flight records from the US Department of Transportation? You’ll find the challenge in the /challenges/01 - Flights Challenge.ipynb notebook!

    That notebook is under within “Files”, above “Users” are folders:

    az-ml-notebook-files

    Clean up

    To reclaim memory usage:

  11. Click the “X” to dismiss the tab representing the Notebook you’re done with.

    There are other *.ipynb (Python Notebook) files described in this webite primarily by Graeme Malcolm presents (no C#, R, Julia here) which call the Azure Machine Learning Python SDK in the azureml-core package in PyPi [used by]. However, links below are to github.com/MicrosoftLearning/mslearn-dp100 by Microsoft’s Graeme Malcolm. It contains iPython Notebook code rather than instructions for setting up the ML Workspace.

    AI-100 and AI-102 both touch on Machine Learning as well.

DP-100

Earn the “Microsoft Certified: Azure Data Scientist Associate” certification by passing the one $165 exam answering 40-60 questions in 210 minutes: DP-100: Designing and Implementing a Data Science Solution on Azure. It has a strong focus on machine learning and Databricks.

  • Manage Azure resources for machine learning (25–30%), which is a higher level than “Setting up an Azure Machine Learning workspace”, which require data and compute.
  • Run experiments and train models (20–25%) using the ML Designer, SDK, and AutoML.
  • Deploy and operationalize machine learning solutions (35–40%) previously “Optimizing and managing models” suing Hyperdrive and model explainers.
  • Implement responsible machine learning (5–10%)

Microsoft’s Study Guide for DP-100 has specific links for each topic: query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE4MHoo

The free text-only “learning paths” tutorials associated:

  • Create machine learning models
  • Create no-code predictive models with Azure Machine Learning
  • Build and operate machine learning solutions with Azure Machine Learning
  • Perform data science with Azure Databricks

https://docs.microsoft.com/en-us/learn/modules/explore-analyze-data-with-python/2-exercise-explore-data

10 hr. MS LEARN PATH: Build and operate machine learning solutions with Azure Machine Learning

MS LEARN LAB: Create machine learning models

Udacity offered those who complete their free “Intro to Machine Learning using Microsoft Azure” before July 30, 2021, 50% off on their paid “Machine Learning Engineer for Microsoft Azure Nanodegree Program”, which include access to real-world projects from industry experts, career services and guidance. Technical mentor support throughout the program review your hands-on projects:

  • Optimizing an ML Pipeline In Azure - leveraging AutoML, hyperparameter tuning, etc. using scikit-learn, Hyperdrive, and AutoML.
  • Operationalizing Machine Learning (into a production environment) using Application Insights, identifying problems in logs, and harnessing the power of Azure’s Pipelines.
  • Capstone Project: using both the hyperdrive and automl API from azureml to build a model using external data of your choice. After training, compare their performance, deploy the best model as a webservice and test the model endpoint.

DP-100 Video courses:

The 11 hr. CloudAcademy.com course on DP-100 by Guy Hummel has github.com/cloudacademy/azureml-intro last updated Sep 2020, which is before changes to the exam May, 2021.

On Pluralsight, Jared Rhodes created:

At ACloudGuru.com by Brian Roehm:


Data Ingestion

Alternatives to get data into ML:

  • AdlCoy
  • Azure CLI
  • AzCopy
  • Azure Cosmos DB Data Migration Tool
  • bcp
  • blobfuse
  • Microsoft Data Management Gateway

ML classification examples

  • literature-map.com suggests other authors based on an author input. The input author is displayed in the middle of a map.

  • Product identification - performing visual searches for specific products in online searches or even, in-store using a mobile device.

  • Disaster investigation - evaluating key infrastructure for major disaster preparation efforts. For example, aerial surveillance images may show bridges and classify them as such. Anything classified as a bridge could then be marked for emergency preparation and investigation.

  • Medical diagnosis - evaluating images from X-ray or MRI devices could quickly classify specific issues found as cancerous tumors, or many other medical conditions related to medical imaging diagnosis.

az-ai-ml-1173x538


Jupyter Notebooks on Azure

If you’re running a Chromebook laptop, there are several ways you can now run your Juypter Notebooks within the Azure cloud:

HISTORY: https://notebooks.azure.com is now redirecting users to other services.

References:

  • https://towardsdatascience.com/running-jupyter-notebook-on-the-cloud-in-15-mins-azure-79b7797e4ef6

ML Studio JupyterLab from local files

DOCS: Run Jupyter Notebooks in a ML workspace

  1. On an internet browser, view a .ipynb (Jupyter notebook) file GitHub.com. It may take several seconds to render. For example:

    NOTE: That is adapted from https://github.com/MicrosoftLearning/mslearn-ai900/blob/main/01%20-%20Image%20Analysis%20with%20Computer%20Vision.ipynb then removing setup in Azure, so that the Notebook can be cross-platform (also work outside of Azure).

    Currently, GitHub does not provide a “run” button when displaying Notebooks.

    For that, you need to create a Cognitive Services instance on Azure, described below.

  2. In a Terminal, load a GitHub repo containing notebooks and associated files:

    cd ~/gmail_acct  # or whatever folder you use to hold repos to be clonned:
    git clone https://github.com/MicrosoftLearning/mslearn-ai900 --depth=1
    cd mslearn-ai900
    
  3. In portal.azure.com:
  4. G+\ Machine Learning.

  5. az-mlworkspace-736x946 Create Machine Learning Workspace: Follow my instructions to create a ML Workspace and run my ./az-mlworkspace-cli.sh.

  6. The script creates these resources under the Resource Group:
    • Machine learning
    • Application Insights
    • Key vault
    • Storage account

  7. G+\ Machine Learning
  8. Click the Machine Learning name just created.
  9. In Portal Machine Learning: “Launch studio” (formerly “Azure Studio”) to open a new browser tab “Microsoft Azure Machine Learning”.

  10. In the left-side navigation bar, select Author: Notebooks.
  11. Click “+ Create” to Upload files.
  12. Navigate thru folder “mslearn-ai900”, “01 - Image Analysis with Computer Vision.ipynb”. Select overwrite and “trust contents of this file”. Click “Upload”.
  13. Copy to clipboard Key1 from running ./az-cog-cli.sh.

  14. Highlight “YOUR_COG_KEY” and paste Key1 from the script run.

  15. Do the same with “YOUR_COG_ENDPOINT”. ???

  16. Click “Authenticate” if that appears.

  17. Delete the Resource Group and Compute so charges don’t accumulate.

References:

NOTE: <a target=”_blank” href=”https://jupyterlab.readthedocs.io/”JypiterLab</a> is more robust than classic Jupyper:

  • Native Git and GitHub support - https://github.com/jupyterlab/jupyterlab
  • Extensible with jupyter labextensions install jupyterlab-drawio
  • Google Drive
  • Dark themes



ML Designer Pipelines

Steps to deploy a machine learning model with the Designer:

  1. Create inference clusters
  2. Create and test inference pipeline
  3. Deploy inference pipeline
  4. Test the service (used by the user)

Alternately, Process (using a Python scipt): azureml-1118x398

FREE Sandbox (Concierge Subscription) Exercise: Call the Text Analytics API from the online testing console Feedback sorter Function app Text Analytics thru Queue, sort based on Sentiment.

https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html

https://www.kaggle.com/fabiendaniel/predicting-flight-delays-tutorial

Create data file

The data used in the tutorial below is from Coursera: Machine Learning Pipeline Tutorial with Azure ML Studio. The tutorial provides a file on its GitHub, so skip this data preparation step (which normally is a large part of the total effort).

Another lab is: MSLEARN “predict-rentals” LAB following https://docs.microsoft.com/en-us/learn/modules/use-automated-machine-learning/use-auto-ml For that, download data file from https://aka.ms/bike-rentals

Generally:

  1. Select the Datasets page (under Assets)
  2. ” + Create”, “From web files”. Web URL: https://aka.ms/bike-rentals

    Alternately, you can upload a file from your local machine.

  3. Dataset type: Tabular
  4. Next

Create ML Workspace resource

  1. Go to G+\ Machine Learning

    If you’re following cloudacademy.com/lab/introduction-azure-machine-learning-studio, select the workspace created and skip to the next section.

    But if you’re not following that, follow steps below:

  2. Select your Directory and Subscription.
  3. Click the blue “Create machine learning workspace”. A new tab appears in portal.azure.com.
  4. Resource Group: PROTIP: just 3 letters are necessary, so use letters (such as “devml”) which does not have ascenders for making numbers to be appended to it more visible.
  5. Workspace Name: PROTIP: just 3 letters are necessary.
  6. Container Registry: To enesure uniqueness, append $RANDOM to your text (to make devml3232).
  7. Container Registry SKU: Basic

    az-ml-workspace-details-872x750

  8. “Review + create”.

    CAUTION: The network is public by default. Choosing private would entail more configuration.

  9. “Create”.

    CAUTION: Charges now begin to accumulate. Delete your Resource Group ASAP. It’s cheaper if you recreate it if you need another workspace.

  10. When created, click “Go to resource” blue button.

    Launch ML Studio

  11. Click “Launch Studio” blue button, which opens a new browser tab.

    Alternately, click this URL or copy the URL and paste in the browser URL field to:

    https://ml.azure.com

    Notice the blue band instead at the top.

  12. At “Welcome to the studio” pop-up, click the “X” dialog button to dismiss it.

    Studio navigation tutorial

  13. Click “+” on the left menu to reveal a list.

  14. To reveal (or hide) left menu icon labels, click the “hamburger” icon at the upper left.

    az-mlstudio-home-2070x1148

    NOTE: The “Start now” items are also listed in the left menu.

    Within the “Assets” category:

    Datasets is where to manage data used in Machine Learning experiments. There, version datasets as well to explore different formats or data content.

    Experiments tracks Machine Learning projects and experiment runs.

    Pipelines manage Machine Learning pipelines to boost efficiency when building Machine Learning models.

    Models manage the models built and shared.

    Endpoints deploy Machine Learning models as REST endpoints on AKS or ACI infrastructure.

    New ML Pipeline

  15. PROTIP: Click “Pipelines”. Clicking “+ New”, then “Pipelines” is like clicking “Designer” and “+ New” Pipeline. Alternately, cursor up/down the left menu and press Enter to select.

    Compute target

  16. On the right-hand side under Settings, click “Select compute target”. Select the compute resource created earlier, then Save.

    If one is already available, click on it and skip to the next section.

    Alternately,

  17. “Compute” menu (under heading Manage).
  18. ”+ New” blue button.
  19. Virtual Machine type: CPU.
  20. Virtual machine size: Select from all options.

    • The cheapest is “Standard_F2s_v2” with “2 cores, 4GB RAM, 16GB storage” for Compute optimized at “$0.11/hr”
  21. Compute name: wow
    • Minimum number of nodes: 0 (the default)
    • Maximum number of nodes: 2 (from 1 the default)
    • Idle seconds before scale down: 120 (from default 1800)

  22. Compute name: PROTIP: 3-characters are the smallerst allowed, such as “ace”, “jim”, “opq”, “rsu”, “vwx”, “yza”, etc.

  23. Enable SSH access: leave unchecked

  24. Next and wait (5 minutes) for State to go from “Creating” to “Running”.

    CAUTION: Charges now begin to accumulate. Delete your Resource Group ASAP. It’s cheaper if you recreate it if you need another compute instance.

    ML Data Input

  25. PROTIP: Instead of using your mouse to expand the assets menu hierarchy, which requires memorizing what is under each asset category:

    az-ml-assets-menu-480x1042

    get the titles of assets to drag-and-drop from this sample pipeline diagram:

    az-ml-pipeline-map-809x692

  26. Click in the field containing “Search by name, tags and description” and type:

    Import Data

    As you type, assets matching your search phrase appear. Stop typing when you see what you want.

    az-ml-feature-search-637x302

    NOTE: The date shown is the version of the asset.

  27. Drag-and-drop the asset “Import Data” onto the top of the (blank) designer canvas.

  28. In the menu that appears on the right, open the “Data source” dropdown to select “URL via HTTP”.
  29. Copy and paste the URL to created data (above), such as this:

    https://raw.githubusercontent.com/cloudacademy/azure-lab-artifacts/master/intro-to-azure-ml/tweets.csv

  30. Wait until “Validating” is done. The larger the file, the longer this will take.

    Submit and Run Experiment

  31. Preview schema to ensure data fields are defined correctly. Save.

  32. In order for Column labels to populate, click “Submit” at the uppper-right to run the model.

  33. In the “Set up pipeline run” dialog, select “Create new” and type experiment name:

    PROTIP: Have a naming convention for models. Begin the Name with “dev” to denote its status. Name models with a suffix of a couple of zeros in front of number 1 in case there are several.

  34. Click Submit on the dialog. Look to the upper-right for the “Running” status to “Finished”, which can be several minutes.

  35. Look for the “Running” status to “Finished”, which can be several minutes.

    Add pipeline steps to filter and process imported data

  36. Search for asset “feature hashing” and drag it under “Import Data” as a new step in the canvas:

    az-ml-feature-search-637x302

  37. Connect two steps: click the circle under the top step (turning it green), then drag it to the circle above the second step (turning that green). An arrow should appear.

    That action converts text data into a vector of features which makes the data more manageable and performant.

  38. In the context menu at the right, click “Edit column name” and select “tweet_text”. Save.

    Split data

  39. Search for asset “split data” to drag-and-drop onto the designer canvas.

  40. Click on it to input “0.8” in the “Fraction of rows in the first ouput dataset” field (replacing the default “0.5”), then Tab away.

    80% - the “training set” is used to train the model.
    20% - the “test set” is used to help score the model later.

  41. Connect the Feature Hashing step with the Split Data step.

  42. Search for asset “Filter Based Feature Selection” and drag it onto the canvas under “Split Data”, then join them.

    This teases out the data by irrelevant attributes or redundant columns. Each feature column is measured and scored then ranked, which improves accuracy when building a predictive model.

  43. Search for asset “Train Model” and drag it onto the canvas.

  44. In Target Column: click the Edit Column link to reveal the list of columns by clicking “Edit column name” to select “sentiment_label”. Save.

  45. Number of desired features: 2000 (instead of default 1).

  46. Feature scoring method: Select “ChiSquared” (instead of default “PearsonCorrelation”).

  47. Search for Score Model

  48. Search for Evaluate Model, drag-and-drop.
  49. PROTIP: Link from Score Model to the left port of Evaluate Model. Otherwise there will be an error.

    https://docs.microsoft.com/en-us/azure/machine-learning/algorithm-module-reference/designer-error-codes

    Training run

  50. Verify that you’ve achieved the pipeline diagram (above).

  51. Click “Submit” at the upper-right to run the whole pipeline to create a model.

    It take several minutes to complete all the steps. The more data, the longer it takes.

    Evaluate ML models

  52. Right-click on the Evaluate Model step to expand “Visualize” before clicking “Evaluation results”:

    az-ml-eval-open-483x255

    BLOG:

  53. Review:

    az-ml-eval-roc-841x503.png

Azure does not present all the statistics, which we cover here.

Confusion Matrix

The multi-colored box at the lower-right is called a “Confusion Matrix”, a metric of classification model performance.

DOC: Test data was split so some of the data is used to determine how well predictions created from a model. The matrix is presented in a 2x2 box with the Predicted label to Actual (True) Label (yes or no) to identify true/false positives/negatives.

REMEMBER for the test: Draw this on the white board from memory:

n=165 Actual: yes 105 Actual: no 60
Predicted: yes 110
"Precision"
Relevant:
100 True Positives
"Sensitivity rate"
10 False Positives
(Type I error)
Predicted: no 55 5 False Negatives
(Type II error)
50 True Negatives
"Specificity = Recall"
All: Accuracy rate Error rate

Outside the box of n (total):

  • Accuracy Overall, how often is the classifier correct? (TP + FN) / n = ( 100 + 5 ) / 165.

  • Prevalence: (aka “Error Rate”) How often does the yes condition actually occur in our sample? actual yes/total = 105/165 = 0.64

Based on n (total) diagonal:*

  • Average Precision (AP) is the ratio of correct predictions (True Positives + True Negatives) to the total number of predictions. When it predicts yes, how often is it True (correct)?”. (100 + 50) / 165

  • Misclassification Rate : Overall, how often is it False (wrong)? (10+5) / 165 = 0.09

Within the box:

  • Precision rate is the ability of a classification model to identify only the relevant data points. It is the percentage of items selected (True Positive and False Positive) which were relevant = correctly predicted yes: 100 / 110 = 0.91. This is used in studying rare diseases when many more people would not have the disease than with the disease or picking terrorists.

VIDEO: Columns represent the known truth: The higher the number, the better:

  • Sensitivity (aka “Recall”) rate or the ability of a model to find all the relevant cases within a dataset. Sensitivity is the percent of items correctly identified as Positive from among relevant items selected. (True Positives and False Negatives). It is the percent of = TP / (TP + FN) = 100 / (100 + 5) = 0.83.

  • Specificity rate is the percent of no’s correctly identified as Negative = TN / (TN + FP) = 50 / (50 + 10) = 0.83.

A perfect classifier has precision and recall both equal to 1. But Positivity and Recall metrics cannot both be perfect. conflict with one another.* Precision and recall should always be reported together.

F-1 Score is a single number that takes into account both precision and recall: the weighted average (harmonic mean) of the true positive rate (recall) and precision = 2 ( 1/P + 1/R ). The larger the F1, the better, when comparing between models.

Different values in the Confusion Matrix would be created for each level of threshold. VIDEO: The Receiver Operating Characteristic (ROC) curve plots the relationship between True Positive Rate (TPR) aka “Sensitivity” on the Y axis and False Positive Rate (FPR) or (1 - Specificity) on the X axis as the decision threshold changes.

VIDEO: stats-roc-1057x650

VIDEO: AUC (Area Under the Curve) measures the area underneath the ROC curve. It is used to compare methods of categorization (such as between Logistic Regression vs Random Forest). A model with AUC of 0.5 performs no better than random chance. The larger the AUC to 1.0 the better the model is at separating classes. Thus, the ideal AUC is 1.0. References:

  • https://towardsdatascience.com/the-roc-curve-unveiled-81296e1577b

Metrics of regression model performance

Which one is best?

  • Coefficient of Determination (R2): (aka “R-Squared) is a relative measure of how well the model fits dependent variables. It summarizes the variance between predicted and true being explained by the model. The closer to 1 this value is, the better the model is performing. It does not take into consideration of overfitting problem if it performs poorly with training data. Thus:

  • Adjusted R Square penalises for additional independent variables added to the model and adjusts the metric to prevent overfitting.

MSE, RMSE or MAE are used to compare performance between different regression models:

  • Mean Absolute Error (MAE) is an absolute measure of the goodness for the fit. It gives you an absolute number on how much your predicted results deviate from the actual number. The average difference between predicted vs. true values. This value is based on the same units as the label, such as dollars. The lower this value is, the better the model is predicting.

  • Root Mean Squared Error (RMSE) is used by Kaggle to assess submissions for its competition. The square root of the mean squared difference between predicted and true values. The result is a metric based on the same unit as the label (dollars). A larger difference When compared to the MAE (above) indicates greater variance in the individual errors (for example, with some errors being very small, while others are large).

To compare models where labels are in different units:

  • Relative Absolute Error (RAE): A relative metric between 0 and 1 based on the absolute differences between predicted and true values. The closer to 0 this metric is, the better the model is performing.

  • Relative Squared Error (RSE): A relative metric between 0 and 1 based on the square of the differences between predicted and true values. The closer to 0 this metric is, the better the model is performing.

Metrics for clustering model performance

  • Average Distance to Other Center is how close, on average, each point in the cluster is to the centroids of all other clusters.

  • Average Distance to Cluster Center is the closeness of all points in a cluster to the centroid of that cluster.

  • Number of Points is how many data points were assigned to each cluster, and the total overall number of data points in any cluster.

    If the number of data points assigned to clusters is less than the total number of data points available, it means that the data points could not be assigned to a cluster.

  • Maximal Distance to Cluster Center is the max of the distances between each point and the centroid of that point’s cluster.

  • If this number is high, it can mean that the cluster is widely dispersed. This statistic together with the Average Distance to Cluster Center to determine the cluster’s spread.

  • Combined Evaluation score (at the bottom of the each section of results) lists the averaged scores for the clusters created in that particular model.

Comparing multiple models

To compare the performance among multiple models, in your pipeline, add an Evaluate Model module and connect the Scored dataset output of the Score Model or Result dataset output of the Assign Data to Clusters to the left input port of Evaluate Model.


Create a Real-Time Inference Pipeline and Deploy an Endpoint

Azure Machine Learning Designer allows models to be deployed as REST endpoint to be consumed by others or an application. This is great for developers that have minimal experience in Machine Learning and want to incorporate predictive models into their application.

The pipeline first has to be converted into an inference-pipeline and then deployed as an endpoint on either AKS (Azure Kubernetes Service) or an Azure Container Instance.

  1. Click “Create inference pipeline” to select “Real-time inference pipeline”.

    This adds “Web Service Input” and “Web Service Output” steps in the canvas.

    az-ml-deploy-797x435

  2. Click “Submit”. Select existing experiment name.
  3. Click “Submit” on the dialog.

  4. Click “Deploy” at the upper-right.

  5. In the “Setup real time endpoint” dialog, with “Deploy new real-time endpoint” selected, type “tweet-analysis” into the Name field.

    PROTIP: If you share a workspace with a team or other teams, make the name unique among all who you work with.

  6. Compute type drop-down: select “Azure Container Instance”.
  7. Click “Deploy” in the dialog.

  8. Wait while “Deploy: Waiting real-time endpoint creation”.

  9. When “Deploy: Succeeded” appears, click “view real-time endpoint” to open another browser tab to show the web app.

  10. Click the Consume tab to review the consumption info.

NOTE: Error messages can be cryptic, such as this:

Deploy: Failed on Preparing to deploy. Details: Call MT PrepareCreateRealTimeEndpointRequest api failed. PipelineRunId is not a Guid-string.


AutoML

  1. Select “Automated ML” (under Author).
  2. ”+ New Automated ML run”.
  3. Click circle to select dataset (“bike-rentals”).
  4. Next for “Configure run” dialog.
  5. “Data Statistics” to see stats for each column. Close.

  6. New experiment name: mslearn-bike-rental
  7. Target column: rentals (interger). This is the label the model will be trained to predict.
  8. Training compute target: the compute cluster you created previously
  9. Select Virtual Machine.

  10. Task type and settings
  11. Task type: Regression (the model will predict a numeric value)
  12. Finish

  13. “Refresh” to see when run gets to “Complete”.
  14. Look at the “Best model summary”

References to classic version:

  • https://medium.com/data-science-reporter/a-simple-hands-on-tutorial-of-azure-machine-learning-studio-b6f05595dd73

azureml sdk package: https://azure.github.io/azureml-sdk-for-r/reference/index.html

  1. “Endpoints” (under heading Assets).

    NOTE: There are Real-time endpoints and Pipeline endpoints.

  2. “Consume” tab

    DOCS:

https://www.coursera.org/projects/automl-computer-vision-microsoft-custom-vision Guided Project: AutoML for Computer Vision with Microsoft Custom Vision by Mario Ferraro

https://www.coursera.org/programs/mckinsey-learning-program-uedvm/browse?currentTab=MY_COURSES&productId=7mGkLZGLEeup-AoS2h03mQ&productType=course&query=azure&showMiniModal=true Azure: Create a Virtual Machine and Deploy a Web Server


Install Visual Studio Code extensions

  1. Open Visual Studio Code on your laptop.
  2. Press Shift+Command+X for Extensions search.
  3. Search for “Azure Machine Learning”
  4. Click “Install”.

    Several extensions are installed (Azure account, AML - Remote).

  5. Search for “Thunder client” for a REST API GUI like Postman.

  6. To invoke extensions, VS Code will apply the extension based on the file type opened (such as .py for Python, etc.)

Etc.

Pytorch https://github.com/Azure/azureml-examples

Configurations:

  • Accuracy
  • AUC weighted
  • Norm macro recall
  • Average precision score weighted
  • Precision score weighted

ML Manage: Compute targets:

  • Compute instances
  • Compute clusters
  • Inference clusters
  • Attached compute

Validation type:

  • Auto
  • k-fold cross validation
  • Monte Carlo cross validation
  • Train-validation split

“Create a machine learning workspace to manage machine learning solutions through the entire data science lifecycle.”

  1. Click “+ Add” or the blue “Create machine learning workspace”.
  2. Subscription
  3. Workspace name: see naming conventions
  4. Region (Location)
  5. Storage account
  6. Key vault
  7. Application insights
  8. Container registry
  9. Networking: connectivity CAUTION: public by default, or private: add endpoint.
  10. Advanced: Data encryption
  11. Advanced: Data impact (data privacy)
  12. Tags

  13. Wait for your workspace to be created (it can take a few minutes).

    Microsoft Azure Machine Learning studio

Coursera Project Network: Predictive Modelling with Azure Machine Learning Studio

  1. On the Overview page, launch Azure Machine Learning studio (or open a new browser tab and navigate to

    https://ml.azure.com

  2. Sign into Azure Machine Learning studio using your Microsoft account. If prompted, select your Azure directory and subscription, and your Azure Machine Learning workspace.
  3. In Azure Machine Learning studio, toggle the ☰ icon at the top left to view the various pages in the interface. You can use these pages to manage the resources in your workspace.
  4. Adjust

    https://docs.microsoft.com/en-us/learn/modules/use-automated-machine-learning/create-compute

  5. TODO: PROTIP: So you don’t pay for idle compute, programmatically start and stop clusters.

    Create Compute Instance

  6. On the Compute Instances tab, add a new compute instance with the following settings. You’ll use this as a workstation from which to test your model:
    • Virtual Machine type: CPU
    • Virtual Machine size: Standard_DS11_v2 (Choose Select from all options to search for and select this machine size)
    • Compute name: enter a unique name
    • Enable SSH access: Unselected
  7. While the compute instance is being created, switch to the Compute Clusters tab, and add a new compute cluster with the following settings. You’ll use this to train a machine learning model:
    • Virtual Machine priority: Dedicated
    • Virtual Machine type: CPU
    • Virtual Machine size: Standard_DS11_v2 (Choose Select from all options to search for and select this machine size)
    • Compute name: enter a unique name
    • Minimum number of nodes: 0
    • Maximum number of nodes: 2
    • Idle seconds before scale down: 120
    • Enable SSH access: Unselected

    PROTIP: At least 5 images are needed to train a Custom Vision model.

    PROTIP: Tags can contain upper case, spaces, special characters.

    Create dataset from Open Datasets

    Datastore types:

    • Azure Blob storage
    • Azure file share
    • Azure Data Lake Storage Gen1
    • Azure Data Lake Storage Gen2
    • Azure SQL database
    • Azure PostgreSQL database
    • Azure MySQL database

MS LEARN HANDS-ON LAB: Create no-code predictive models with Azure Machine Learning

Supervised: Regression & Classification


Continuous Deployment

MLOps is powered by Azure DevOps


HDInsight from 2017

Fraud Detection with Azure HDInsight Spark Clusters

Loan Credit Risk with Azure HDInsight Spark Clusters

Loan ChargeOff Prediction with Azure HDInsight Spark Clusters

Data Science VM

https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/overview#whats-included-in-the-data-science-vm

Resources

Notes to be inserted

Steps for data transformation:

  • Feature selection
  • Finding and removing data outliers
  • Impute missing values
  • Normalize numeric features

Model training:

  • Label data
  • Algorithm selection
  • Data split
  • Run model

https://docs.python.org/3/library/pickle.html pickle (.pkl) file format for Python object serializatin

More Complexity makes for better Intelligibility

  • Linear regression
  • Decision Trees
  • K-nearest nighbors
  • Random Forests
  • Support Vector Machines
  • Deep Neural Networks

Global features vs. local feature importance

https://www.twitch.tv/enceladosaurus

https://www.twitch.tv/thelivecoders An outgoing and enthusiastic group of friendly channels that write code, teach about technology, and promote the technical community. https://github.com/livecoders/home Wiki https://github.com/livecoders/Home/wiki

https://shap.readthedocs.io/en/latest/index.html

https://databricks.com/session_na20/the-importance-of-model-fairness-and-interpretability-in-ai-systems

https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability-aml

More

This is one of a series on AI, Machine Learning, Deep Learning, Robotics, and Analytics:

  1. AI Ecosystem
  2. Machine Learning
  3. Testing AI

  4. Microsoft’s AI
  5. Microsoft’s Azure Machine Learning Algorithms
  6. Microsoft’s Azure Machine Learning tutorial
  7. Microsoft’s Azure Machine Learning certification

  8. Python installation
  9. Juypter notebooks processing Python for humans

  10. Image Processing
  11. Tessaract OCR using OpenCV
  12. Amazon Lex text to speech

  13. Code Generation

  14. Multiple Regression calculation and visualization using Excel and Machine Learning
  15. Tableau Data Visualization