Wilson Mar bio photo

Wilson Mar

Hello. Hire me!

Email me Calendar Skype call 310 320-7878

LinkedIn Twitter Gitter Instagram Youtube

Github Stackoverflow Pinterest

All the options for Enterprise integration


Microsoft ML

Microsoft has a “Cheat Sheet” to help you select a Machine Learning alogorithm from among the various initiatives:

Category Algorithm MS
Statistical Functions Descriptive Statistics (Summarize Data) Y
Hypothesis Testing Y
Compute T-Test Linear Correlation Y
Evaluate Probability Function Evaluation Y
Recommendation (collaborative filtering) Train Matchbox Recommender Y
Score Matchbox Recommender Y
Evaluate Recommender Y
Regression Bayesian Linear Regression Y
Boosted Decision Tree Y
Tree Decision Forest Y
Fast Forest Quantile Regression Y
Linear Regression Y
Neural Network Regression Y
Ordinal Regression Y
Poisson Regression Y
Clustering K-means Clustering Y
Anomaly Detection One-class Support Vector Machine Y
Principal Component Analysis-based Anomaly Detection Y
Time Series Anomaly Detection Y
Averaged Perceptron Y
Bayes Point Machine Y
Boosted Decision Tree Y
Decision Forest Y
Decision Jungle Y
Logistic Regression Y
Neural Network Y
Support Vector Machine Y
Multi-class Classification Tune Model Hyperparameters Y
Decision Forest Y
Decision Jungle Y
Logistic Regression Y
Neural Network Y
One-vs-all Y
Text Analytics Feature Hashing Y
Named Entity Recognition Vowpal Wabbit (v8) Y
Computer Vision OpenCV Library Y
Voice Text to Speech Cortana
Translation Language Translation Cortana

A-Z List of Machine Learning Studio Modules from Microsoft Azure

https://bit.ly/a4r-mlbook Azure Machine Learning: Microsoft Azure Essentials

https://www.youtube.com/watch?v=eUce2cB844s&t=19m52s Hands-On with Azure Machine Learning 26 Sep 2016 predicts car prices


Some utilities may involve conventional lookups of data:

Algorithmia.com provide API interfaces to algorithms offered by its partners.

  • https://algorithmia.com/algorithms/alixaxel/CoordinatesToTimezone

  • https://algorithmia.com/algorithms/Geo/ZipData

  • https://algorithmia.com/algorithms/Geo/ZipToState

  • https://algorithmia.com/algorithms/Geo/LatLongDistance

  • https://algorithmia.com/algorithms/Geo/LatLongToUTM

  • https://algorithmia.com/algorithms/util/ip2hostname

  • https://algorithmia.com/algorithms/opencv/ChangeImageFormat (from jpg to png)


  • Google Translate API has been working on websites for years.

Image Recognition / Computer Vision

  • https://algorithmia.com/algorithms/z/ColorPalettefromImage

  • Google Cloud Vision API

  • https://algorithmia.com/algorithms/opencv/FaceDetection then https://algorithmia.com/algorithms/opencv/CensorFace

  • https://algorithmia.com/algorithms/ocr/RecognizeCharacters OCR

Some of these make use of OpenCV (CV = Computer Vision).

Handwriting recognition book and GitHub for Neural Networks and Deep Learning by Michael Nielsen

http://www.deeplearningbook.org by Ian Goodfellow, Yoshua Bengio, and Aaron Courville.

Voice Recognition

Speech to Text

Sentiment Analysis

Analyze text for positive or negative sentiment, based on a training database of potential word meanings:

  • https://algorithmia.com/algorithms/nlp/SentimentAnalysis

  • IBM

Document (article) Search

TF-IDF = Term Frequency - Inverse Document Frequency emphasizes important words (called a vector) which appear rarely in the corpus searched (rare globally). which appear frequently in document (common locally) Term frequency is measured by word count (how many occurances of each word).

The IDF to downweight words is the log of #docs divided by 1 + #docs using given word.

Cosine similarity normalizes vectors so small angle thetas identify similarity.

Normalizing makes the comparison invariant to the number of words. The common compromise is to cap maximum word count.

Microsoft Azure Machine Learning

https://azure.microsoft.com/en-us/services/machine-learning offers free plans

  • Guest Workspace for 8 hours on https://studio.azureml.net/Home/ViewWorkspaceCached/…

  • Registered free workspaces with 10 GB storage can scale resources to increase experiment execution performance.

All their plans offer:

  • Stock sample datasets
  • R and Python script support
  • Full range of ML alogorithms
  • Predictive web services

Follow this machine learning tutorial to use Azure Machine Learning Studio to create a linear regression model that predicts the price of an automobile based on different variables such as make and technical specifications. Then iterate on a simple predictive analytics experiment after

Regression works on numbers.
Classification works on strings.

  1. Enter Microsoft’s Learning Studio:

    As per this video:


  2. Look at examples in the Cortana Intelligence Gallery

  3. Take the introductory tutorial:

    Introduction to Machine Learning with Hands-On Labs

  4. Create a model

  5. Prepare Data:

    As per this video using

    • Clean Missing Data - Clip Outliers
    • Edit Metadata
    • Feature Selection
    • Filter
    • Learning with Counts
    • Normalize Data
    • Partition and Sample
    • Principal Component Analysis
    • Quantize Data
    • SQLite Transformation
    • Synthetic Minority Oversampling Technique
  6. Train the model

    • Cross Validation
    • Retraining
    • Parameter Sweep
  7. Score and test the model

  8. Make predictions with Elastic APIs

    • Request-Response Service (RRS) Predictive Experiment - Batch Execution Service (BES)
    • Retraining API



Python 3.6 has formatted strings


Conda is similar to virtualenv and pyenv, other popular environment managers.




   conda install numpy pandas matplotlib

   conda install jupyter notebook

   conda install -c https://conda.binstar.org/menpo opencv
  1. Can’t find it? Look among all users and operating systems supported

    anaconda search -t conda pygame

    On a Mac https://anaconda.org/tlatorre/pygame is not recognized because it’s only for Linux.

    On Stack Overflow a user recommends on that supports Windows 32 and 64, MacOS, and Linux:

    conda install -c cogsci pygame=1.9.2a0


    pip install pygame
  2. Copy a user/package to show more info:

    anaconda show USER/PACKAGE
  3. List the packages installed, with its version number and what version of Python:

    conda list

Conda Environments

  1. Create new environment for Python, specifying packages needed:

    conda create -n my_env python=3 numpy pandas

  2. Enter an environment on Mac:

    source activate my_env

    On Windows:

    activate my_env

    When you’re in the environment, the environment’s name appears in the prompt:

    (my_env) ~ $.

  3. Leave the environment (like exit):

    source deactivate

    On Windows, it’s just deactivate.

  4. Get back in again.

  5. Create an enviornment file by piping the output from an export:

    conda env export > some_env.yaml

    When sharing your code on GitHub, it’s good practice to make an environment file and include it in the repository. This will make it easier for people to install all the dependencies for your code. I also usually include a pip requirements.txt file using pip freeze (learn more here) for people not using conda.

  6. Load an environment metadata file:

    conda env create -f some_env.yaml

  7. List environments created on your machine:

    conda env list

  8. Remove an environment:

    conda env remove -n some_env

  9. Add a package


    SO on Install on Windows 8

    anaconda show menpo/opencv3

    conda install –channel https://conda.anaconda.org/menpo opencv3

    FFMPEG codec

  10. Test within Python »> :

    import cv2

    The response should be:



  11. Install readline to do autocompletion in Jupyter notebooks by hitting tab

    conda/pip install readline

    Readline comes with anaconda


  • https://conda.io/docs/using/index.html

  • https://jakevdp.github.io/blog/2016/08/25/conda-myths-and-misconceptions/

  • http://cola.github.io



http://www.h2o.ai/h2o/ platform is built using Java working on H2O’s rapid in-memory distributed parallel processing. Its models can be visually inspected during training, which is unique to H2O. , so they can immediately spot a job that should be stopped and more quickly iterate to find the optimal approach.



Machine Learning Recipes (using Python) - 17 videos by
Google Developers with Josh Gordon in New York at @random_forests

  1. Hello World [6:52]</a> apples and oranges

  2. Visualizing a Decision Tree - Apr 13, 2016 [6:31]</a>

    50 examples of each of 4 types of irises, with Sepal and Petal length and width, at https://en.wikipedia.org/wiki/Iris_flower_data_set

    [4:18] https://graphviz.org


    open -a preview iris.pdf

    sudo python3 -m pip install pydot

  3. What Makes a Good Feature?</a>

  4. Let’s Write a Pipeline</a>

  5. Writing Our First Classifier [8:43]</a>

  6. Train an Image Classifier with TensorFlow for Poets</a>

  7. Classifying Handwritten Digits with TF.Learn</a>

https://hub.docker.com/r/jbgordon/recipes/ is a Docker image to help folks having trouble with Pydot or Graphviz. It has all the dependencies setup and installation instructions.


awesome-machine-learning provides many links to resources, so they will not be repeated here.