Get setup to run Python Jupyter Notebooks using CLI or ML Studio
- Several interfaces
- DSVM (Data Science Virtual Machine)
- Hands-on labs
- Deploy to production
- Python coding
This is a DRAFT deep-dive guided tour of how to get setup to begin using Microsoft’s various offerings for Machine Learning in their Azure cloud. After this, see use cases:
- Predictive Maintenance
Regardless of the tool, the workflow for developing intelligent models for predictive analytics revolves around these phases:
- Business understanding
- Data acquisition and understanding (establishing Data source, pipeline, envrionment, “wranging” exploration & cleaning)
- Modeling (Feature Engineering, Model Training, Model Evaluation)
- Deployment (scoring, performance monitoring)
- Customer acceptance
These are from Microsoft’s Team Data Science Process (TDSP) at https:aka.ms/tdsp. It’s akin to CRISP-DM (Cross-Industry Standard for Data Mining) and KDD (Knowledge Discovery in Databases).
https://github.com/Azure/Azure-TDSP-ProjectTemplate sample folder and file names.
- System architecture
- Data dictionaries
- Reports related to data understanding, modeling
- Project management and planning docs
- Information obtained from a business owner or client about the project
- Docs and presentations prepared to share information about the project
Microsoft proposes a “Final Report” containing:
- What is target definition
- What are inputs (description)
- What kind of model was built?
- Simple solution architecture (Data sources, solution components, data flow)
- What is output?
- Data Schema
- Selection (dates, segments)
- Stats (counts)
- List of raw and derived features
- Importance ranking.
- Description or images of data flow graph. If AzureML, link to:
- Training experiment
- Scoring workflow
- What learner(s) were used?
- Learner hyper-parameters
- ROC/Lift charts, AUC, R^2, MAPE as appropriate
- Performance graphs for parameters sweeps if applicable
- Variable Importance (significance)
- Insight Derived from the Model
Conclusion and Discussions for Next Steps
- Discussion on overfitting (if applicable)
- What other Features Can Be Generated from the Current Data
- What other Relevant Data Sources Are Available to Help the Modeling
Microsoft offers several tools for machine learning:
- Azure Machine Learning Studio, on-line drag and drop interface for creating simpler machine learning workflows. It has limitations about the size of the data that can be handled (about 10gigs of processing).
- Azure Machine Learning Workbench, downloaded client GUI/IDE running on your laptop
- Azure Machine Learning CLI for running (repeatable) scripts in your local Terminal instead of clicking
- Azure Machine Learning Experimentation, completely on-line to create models
- Azure Machine Learning Model Management, on-line to deploy models (after you create models)
- Azure Machine Learning Compute for Model Management
- Data Science Virtual Machine
- HDInsight big data
- AI Batch that runs multiple jobs in parallel for scale
PROTIP: One has to traverse three different web sites to design, deploy, and consume Machine Learning.
Working backwards from the Azure Portal used in production:
Visit the Azure Portal Dashboard at
PROTIP: Copy this URL and open up a different internet browser (such as Chrome, Firfox, Brave) so that you can tab between them easier.
If you’ve already created a “Machine Learning Experiment”, click here</a> Otherwise, click “+ Create a Resource” in the top left menu:
AI and Machine Learning projects usually involve several of these “Azure Marketplace” items.
Click “AI + Machine Learning” to see the scope of sophistication Microsoft has achieved.
NOTE: “Data Science Virtual Machine - Windows 2016” is not included in the list to Create virtual machine
At the top next to the “Featured” heading, click “See all”, notice Microsoft’s categorizes:
- Machine Learning (below)
- Bot Service
- Cognitive Service (which provides AI services for speech, vision, text analytics, search, etc.). Microsoft previously referenced them under acronmy CNTK (CogNitive ToolKit).
The above list are links to separate article/page for each category.
TBD: The “Machine Learning Model Management”
Click “Machine Learning Experimentation”, which was in “preview” at the time of writing.
If you don’t have an Experiment account, you are prompted for one.
The Experiment Name has is a maximum of 32 characters. It says the only special character allowed is the dash (but it didn’t reject underlines) to separate words.
PROTIP: Define a naming convention for your Resource Groups which your team will follow for consistency, to avoid confusion.
For Location, in the US there is “East US 2” and “West Central US”.
The “seats” (developer) is how it’s charged, with the first 2 free.
Create a new Storage account if you haven’t already. This is to store tracked execution output.
NOTE: A Visual Studio Team Service account for version-controlling your project using a Git repo.
Your Model Mangement account name can be up to 260 characters.
Select “DEVTEST” for Model Management account fee type. The types:
PROTIP: Together here are Per day and Per month fees from the Pricing FAQ
Plan S1 S2 S3 Per day: $1.60 $12.08 $80.63 Per month: $50 $375 $2,500
31 days per month.
Check “Pin to dashboard”.
Click “Automation options” to expose the parameters file for the selections above.
Click “Download” to save the “template.zip” file to your laptop for use in CLI calls.
PROTIP: The files downloaded in the zip enable you to quickly re-create the manual selections you took above. This enables you to save money by re-creating on demand rather than paying for unused services lingering around racking up charges.
- Click the “CLI” tab for the Bash (deploy.sh) script to make use of the parameter file.
- Click the “PowerShell” tab for the deploy.ps1 script.
- Click the “.NET” tab for the DeploymentHelper.cs programming code.
- Click the “Ruby” tab for the deployer.rb file.
- Click “Create”. Wait for the completion pop-up in your account’s Dashboard.
- Click “See more” at the bottom of the “All resources” pane. This is because you may not see the item you just created.
Click the Machine Learning Experimentation you just created.
Install local Workbench
QUESTION: Is there a brew formula for this?
- In a Machine Learning Experimentation page, click “Azure Machine Learning Workbench” for your operating system (Mac file AmlWorkbenchSetup.dmg or Windows).
- In Finder, double-click on the .dmg file to run the installer.
- Click on the big circle in the “Workbench Installer.app” pop-up.
- Click “Continue” to defaults resulting in download of Miniconda with Python 3.5.2 (even though it’s already installed locally).
- Return to the “Workbench Installer.app” pop-up to dismiss it.
- Click “Launch Workbench” when that appears.
Sign in with your Microsoft account.
Sample Project Iris recognition
- New project workspace.
- Open in Visual Studio Code.
- [1:46] For Arguments, type “0.01” to set the “regularization rate” to determine how the linear regression model is trained.
- Click Run to see the job status panel on the right go from “Submitting” to “Running” to “Completed”.
- Change the Argument to “.1” and Run again.
- Change the Argument to “1” and Run again.
- Change the Argument to “10” and Run again.
Click on the runs (clock) icon on the left menu for the “iris_sklearn.py”
Create a Machine Learning Studio Workspace
Microsoft’s documentation on this is at https://docs.microsoft.com/en-us/azure/machine-learning/studio/create-workspace
Switch to another window by opening a new tab. On MacOS press Command+Tab. On Windows press Ctrl+Esc.
Azure Machine Learning Studio
Microsoft’s AzureML Studio provides an on-line app with drap and drop ease of use with no setup on your laptop or servers. So you can use a Chromebook with no hard drive.
NOTE: AzureML supports R and Python scripts.
Click the blue “Sign In” if that appears.
Click the blue “Enter” for the 8-hour trial with no login to learn how it works using Microsoft’s sample data.
You come back to this page above after you timeout.
PROTIP: With the free account, SQL database access is not allowed, so stay with .csv files.
Alternately, if you want to use your own data (up to 10GB), click “Sign-in”.
Click the “hamberger menu” at the upper left.
Selecting Cortana Intelligence takes you to the “Azure AI” marketing page at
BTW There is also a marketing page at
Click the “hamberger menu” at the upper left again and
click “Gallery” to look at examples in https://gallery.azure.ai titled “Azure AI Gallery” (previously called the Cortana Intelligence Gallery at gallery.azureml.net).
PROTIP: There is not many views to an item, so you’re one of the early adopter visionaries!
The menu item on the menu named Solutions applies to usage by a particular customer/industry.
An example of a solution is the Interactive Price Analytics applying models (studied in “Micro Economics” courses) that are the basis for recommending pricing changes based on the history of how demand for particular products responds to prices changes.
Within the ML Studio:
PROJECTS are in both - collection of scripts, notebooks, and/or data designed to support the everyday work of data scientists.
“EXPERIMENTS” construct workflows using a sequence of techniques on data. See https://gallery.azure.ai/experiments
Click “SAMPLES” from Microsoft.
NOTE: The examples are from June, 2016.
“NOTEBOOKS” are iPython Notebooks that present an explaination of each Python coding step.
See the tutorial at https://gallery.azure.ai/Notebook/773ccb4114fa46c99004381bbfa328de
https://docs.microsoft.com/en-us/azure/machine-learning/studio/studio-overview-diagram Microsoft Azure Machine Learning Studio Capabilities Overview
“DATASETS” contain data.
ONNX vs WinML files
https://gallery.azure.ai/models (.ONNX or .WinML files) for download - to build projects and solutions.
Most of them are ported from the repository of pre-trained machine learning computational graph models in ONNX (Open Neural Network Exchange) format (https://onnx.ai), which can run on different deep learning frameworks (Tensorflow, Keras, Microsoft Cognitive Toolkit, Apache MXNet, Facebook’s PyTorch, or Caffe2).
WinML (Windows Machine Learning) https://docs.microsoft.com/en-us/windows/uwp/machine-learning/ converts ONNX models to execute locally in Windows 10 devices, leveraging GPU Hardware Acceleration on DirectX12. WinML converts models in Apple CoreML, scikit-learn (subset of models convertible to ONNX), LibSVM, XGBoost
PROTIP: Consider these collections:
- Linear (straight line) Regression classifer using CNTK</s>
- Multi-variate/Logistic regression
- K-means clustering analysis / Softmax
- Guided selection of wines: https://gallery.azure.ai/Experiment/30146a6b10c6401d9b65afac17dfa5d9
svm (support vector machine)
Twitter sentiment analysis click “DEPLOY” for
estimate of cost
CNTK CogNitive ToolKit
https://gallery.azure.ai/Tutorial/Cognitive-Toolkit-Tutorial-Getting-Started makes use of Microsoft’s new network-description language BrainScript at https://docs.microsoft.com/en-us/cognitive-toolkit/BrainScript-Network-Builder https://github.com/Microsoft/CNTK/wiki/Tutorial https://github.com/Microsoft/CNTK
CNTK invented some terms:
- a sequence (in CNTK) is also referred to as an instance
- a sample (in CNTK) is also referred to as a feature
- input stream(s) (in CNTK) is also referred to as feature column or a feature
- the criterion (in CNTK) is also referred to as the loss
- the evalutaion error (in CNTK) is also referred to as a metric
Click through to a page with “Open in Studio”.
Your selects appear in the “TRAINED MODELS” tab on the left side.
DSVM (Data Science Virtual Machine)
Microsoft offers in their Azure cloud virtual machine (VM) custom image called DSVM (Data Science Virtual Machine) that has all the utility software needed for data science, which involves deep learning frameworks and tools for machine learning, pre-built, installed, and configured so they are ready to run immediately:
Machine learning tools installed to test inferencing:
- Caffe: "A deep learning framework built for speed, expressivity, and modularity". Great for computer vision projects.
- Caffe2: A cross-platform version of Caffe
- Microsoft Cognitive Toolkit (CNTK): A deep learning software toolkit from Microsoft Research
- H2O: An open-source big data platform and graphical user interface
- Keras: A high-level neural network API in Python for Theano and TensorFlow
- Apache MXNet Server: A flexible, efficient deep learning library with many language bindings. Great for recommendation engines.
- NVIDIA driver, CUDA 9, and cuDNN 7
- NVIDIA DIGITS: A graphical system that simplifies common deep learning tasks
- Torch: A scientific computing framework with wide support for machine learning algorithms
- PyTorch: A high-level Python library with support for dynamic networks
- TensorFlow: An open-source library for machine intelligence from Google
- * TensorFlow Serving, TensorRT, Chainer, Deep Water
- Theano: A Python library for defining, optimizing, and efficiently evaluating mathematical expressions involving multi-dimensional arrays
- Many sample Jupyter notebooks
For Machine Learning:
- Vowpal Wabbit: A fast machine learning algorithm supporting techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning
- XGBoost: for gradient boosting - a tool providing fast and accurate boosted tree implementation
- Rattle (R Analytic Tool To Learn Easily): provides a Gnome (RGtk2) GUI based interface to R functionality that makes getting started with data analytics and machine learning in R easy. From CRAN.
- LightGBM: A fast, distributed, high-performance gradient boosting framework
DSVM also contains popular tools for data science and development activities, including:
- Weka allows easy graphical exploration and machine learning.
- Development tools and editors (RStudio, PyCharm, IntelliJ, Emacs, vim)
Visual Studio Code, IntelliJ IDEA, PyCharm, and Atom
- Microsoft R Server 9.2.1 with Microsoft R Open 3.4.1, MicrosoftML package with machine learning algorithms, RevoScaleR and revoscalepy for distributed and remote computing, and R and Python Operationalization
Anaconda Python 2.7 and 3.5
- JuliaPro - a curated distribution of Julia language with popular scientific and data analytics libraries, used by Jupyter
JupyterHub - a multiuser Jupyter notebook server supporting R, Python, PySpark, Julia kernels, with sample notebooks
- Spark local 2.2.0 with PySpark and SparkR Jupyter kernels
Single node local Hadoop
- Azure Storage Explorer
- Azure command-line interface
Azure SDK in Java, Python, node.js, Ruby, PHP
- H2O, Deep Water, and Sparkling Water
- SQL Server 2017
- SQL Server Machine Learning Services (revo scale py)
Apache Drill for querying non-relational data using SQL.
- Intel Math Kernel Library
All frameworks are the GPU versions but work on the CPU as well.
At the Plans and Pricing
Provision a Test Drive instance
- Click “Test Drive” (for 8 hours).
Provision a full instance
Michelene Harris in [19:16] Feb 21, 2018 VIDEO: JupyterHub on the Linux Data Science Virtual Machine has showing us how to provision the Linux DSVM (a PasS offering on Azure), fire up the JupyterHub system for logging in to Jupyter, and then stopping the VM.
- Click “Get”
Fill out the form: Name needs to be globally unique among all Azure customers.
PROTIP: Use a Resource Group for the same project, so it can be deleted for the whole project.
- Click “Create”.
- Create a virtual machine
- Select password, etc.
Select size “B1s” to start with at 1.3 cents per hour. “B” is for “Burstable”.
Michelene Harris in [19:16]Feb 21, 2018 VIDEO: JupyterHub on the Linux Data Science Virtual Machine says she has found “DS13v2” and “DS13v3” to work well.
PROTIP: “ND” VMs up to 24 GB available are designed for artificial intelligence (AI) and deep learning using CNTK, TensorFlow, Caffe, Caffe 2, and other frameworks. NVIDIA Tesla P40 GPUs, RDMA and InfiniBand networking for large-scale training jobs. Starting from $1,511.10 per month.
Set no Availability Zone, no managed disk,
NOTE: The X2Go client performed significantly better than X11 forwarding in testing. We recommend using the X2Go client for a graphical desktop interface.
Install the X2Go client from this webpage
- Drag the “x2goclient.app” and drop on the Applications folder.
- In Folder, right-click on “x2goclient.app” to select “Open”.
Select “GNOME” instea of “KDE”.
Log in to your VM via SSH. You can use PuTTY for terminal access on Windows or X2Go for graphical access.
Connect to the XFCE graphical desktop on the VM using X2Go.
The steps below are based on this web page.
Install the CLI by downloading the AmlWorkbenchSetup.dmg file from
The CLI for Azure Machine Learning services is different from the Azure CLI used for managing Azure resources.
This “Azure Machine Learning Workbench” installer includes CLI.
From inside Finder, double-click the .dmg to install.
BLAH: Use of .dmg file instead of brew means that versions are not managed automatically.
Click “Open” to the “… downloaded from the internet” pop-up.
- Click the blue link “Click here for a full list of dependencies that will be installed”.
Azure_cli_machinelearning (0.0.70+dev) Azureml.dataprep (0.1.1709.14033) and other azure libraries Conda (4.2.12) Numpy (1.11.3) Pandas (0.19.2) Dill (0.2.6) Numexpr (2.6.1) Scipy (0.18.1) Regex (2017.07.28) Xlrd (1.1.0) Pyarrow (0.6.0) scikit-learn (0.18.1) jupyter (1.0.0) psutil (5.2.2)
- Click “Install”.
Click “Launch Workbench”.
AZ ML CLI
In a Terminal, run this command to install the Machine Learning CLI:
pip install -r https://aka.ms/az-ml-o16n-cli-requirements-file
In a Terminal, run this command to verify:
az ml --help
The expected response is:
Group az ml: Module to access Machine Learning commands. Subgroups: account : Manage operationalization accounts. env : Manage compute environments. image : Manage operationalization images. manifest: Manage operationalization manifests. model : Manage operationalization models. service : Manage operationalized services.
However, installation did not go well if the response is instead:
az: error: argument _command_package: invalid choice: ml
Documentation on az ml
In a Terminal:
Setup compute targets:
az ml computetarget --debug
To attach a Data Science VM target (remotedocker):
TARGET_NAME="wow2" TARGET_ADDR="dsvmtd6eqpmmla6ly4c.westcentralus.cloudapp.azure.com" # or FQDN AZ_USERNAME="dsvm" AZ_PASSWORD="Dsvm@6eqpmmla" az ml computetarget attach remotedocker \ -n "$TARGET_NAME" -a "$TARGET_ADDR" \ -u "$AZ_USERNAME" -w "$AZ_PASSWORD"
PROTIP: For better security, define AZ_USERNAME and AZ_PASSWORD in a separate shell file.
To attach an HDInsight target:
CLUSTER_NAME="myhdicluster-ssh.azurehdinsight.net" az ml computetarget attach cluster \ -n "$TARGET_NAME" -a "$CLUSTER_NAME" \ -u "$USERNAME" -w "$PASSWORD"
Within the aml_config folder, you can change the conda dependencies.
Also, you can operate with PySpark, Python, or Python in a GPU DSVM.
Submit jobs for remote execution
az ml experiment --help
Work with Jupyter notebooks
Interact with run histories
- Configure your environment to operationalize
Web page “Deep Learning Basics for Predictive Maintenance” dated June 12, 2017 mentions the Download: Turbofan Engine Degradation Simulation Data Set (file CMAPSSData.zip) from the Prognostics Center of Excellence (PCoE) at NASA’s Ames Research Center, Moffett Field, CA. The file’s name refers to the C-MAPSS simulation system used to generate the data.
When you click “View Tutorial” you see in GitHub “Deep Learning Basics for Predictive Maintenance.ipynb” within the “lstms_for_predictive_maintenance” repo under the Azure account also by Fidan Boylu Uz.
https://github.com/Microsoft/DataStoriesSamples contains Sample code and documentation on data stories that showcase how you can use Cortana Intelligence Suite, SQL Server and Microsoft R Server.
- Sentiment Analysis of the famous novel War and Peace by Leo Tolstoy. Sample data in csv:
"1805","BOOK 01","CHAPTER 01","I have faith only in God and the lofty destiny of our adored monarch.","God",0.81726600036296682
https://github.com/Microsoft/sql-ml-tutorials contains SQL Server tutorials for Python and R.
https://github.com/Microsoft/code-challenges contains 15-minute code challenges for //BUILD 2017 conference. Usage is across data platform and analytics. Inclusive of SQL Server on Linux, Azure SQL Database, Azure DocumentDB, Azure Search, HDInsight, MySQL as a Service, PostgreSQL as a Service, Bot Framework, Python Tools for Visual Studio, R Tools for Visual Studio.
Take the introductory tutorial:
Create a model.
As per this video using
- Clean Missing Data - Clip Outliers
- Edit Metadata
- Feature Selection
- Learning with Counts
- Normalize Data
- Partition and Sample
- Principal Component Analysis
- Quantize Data
- SQLite Transformation
- Synthetic Minority Oversampling Technique
Train the model
- Cross Validation
- Parameter Sweep
Score and test the model.
Make predictions with Elastic APIs
- Request-Response Service (RRS) Predictive Experiment - Batch Execution Service (BES)
- Retraining API
Deploy to production
See https://services.azureml.net/ and https://docs.microsoft.com/en-us/azure/machine-learning/studio/model-progression-experiment-to-web-service
NOTE: You can not select the “DEVTEST” pricing tier to use for deployment.
From the Overview section:
In order to deploy and manage models in production, you need to create a machine learning compute environment.
First, set up a virtual environment - you only need to do this once for your subscription:
In the Azure portal, click the Cloud Shell button on the menu in the upper-right of the Azure portal. In Cloud Shell, create a virtual environment:
virtualenv -p /usr/bin/python3 my-project-path
Add the following lines to ~/.bashrc:<br /> export PATH="<em>my-project-path</em>/bin":$PATH export PYTHONPATH="<em>my-project-path</em>/bin" Run:<br /> source ~/.bashrc Install Azure CLI in the virtual environment:<br /> <pre><em>my-project-path</em>/bin/pip install azure-cli</pre> Install Azure ML CLI also:<br /> <pre><em>my-project-path</em>/bin/pip install azure-cli-ml</pre> Alternately, get tested versions with: <pre>pip install -r https://aka.ms/az-ml-o16n-cli-requirements-file</pre> After installation, <tt>find / -name azure-cli-ml 2>/dev/null</tt> to reveal location installed: <pre>~/Library/Caches/AmlWorkbench/Python/az-extensions/azure-cli-ml</pre>
After you have set up the virtual environment once, use the following steps every time you need to set up and configure a compute environment for deploying and managing models:
Create a remote compute environment - you will be prompted to login and select a subscription:<br /> <pre>az ml env setup -n <em>compute-environment-name</em> \ -l <em>location-of-the-environment-resources</em> \ [-g <em>name-of-a-new-or-existing-resource-group</em>] \ -c</pre> To see if your environment is ready to be used, run the below command.<br /> <pre>az ml env show -g <em>resource-group-name</em> -n <em>compute-environment-name</em></pre> Once provisioning is complete, set the compute environment where you want to deploy services:<br /> <pre>az ml env set -g <em>resource-group-name</em> -n <em>compute-environment-name</em></pre>
https://docs.microsoft.com/en-us/azure/machine-learning/desktop-workbench/data-prep-python-extensibility-overview Python for data prep.
https://docs.microsoft.com/en-us/azure/machine-learning/service/reference-python-package-overview in Computer Vision, Forecasting, FPGA acceleration, and Text analytics.
https://www.microsoft.com/germany/techwiese/events/ai-lab/default.aspx Microsoft Student AI Lab #StudentAILab
Get started with Microsoft Cognitive Services Dec 01, 2016 at 4:28PM by Matt Winkler, GPM of Microsoft’s Data Group
What’s new with Azure Machine Learning May 06, 2018 at Microsoft Build 2018 by Matt Winkler @mwinkle
. At [3:05] is this diagram:
Bob Strudwick, CTO of UK’s Asos fashion website showed [14:14] Data Science Capabilities with demos by Naeem Khedarun, Saul Vargas Sandoval [43:40] https://aka.ms/aml-packages to proprietary Python packages for Azure Machine Learning includes Computer Vision, Forecasting, Text Analytics, hardware acceleration.
https://aischool.microsoft.com/learning-paths links to Edx’s in Microsoft Professional Program in Artificial Intelligence. https://my.visualstudio.com/benefits?wt.mc_id=AISchool
https://aka.ms/iotrefarchitecture (pdf) Azure Machine Learning: Machine learning In The Cloud With Azure Machine Learning
https://tetranoodle.com/course-materials-azure-ml/ CTO British Columbia
This is one of a series on AI, Machine Learning, Deep Learning, Robotics, and Analytics:
- AI Ecosystem
- Machine Learning
- Microsoft’s AI
- Microsoft’s Azure Machine Learning Algorithms
- Microsoft’s Azure Machine Learning tutorial
- Python installation
- Image Processing
- Tessaract OCR using OpenCV
- Multiple Regression calculation and visualization using Excel and Machine Learning
- Tableau Data Visualization