Wilson Mar bio photo

Wilson Mar

Hello. Join me!

Email me Calendar Skype call 310 320-7878

LinkedIn Twitter Gitter Google+ Youtube

Github Stackoverflow Pinterest

Here are the corporate overlords are making humans into robots


Leading Companies

Major organizations are in an arms race in offering Artificial intelligence and Machine Learning (ML) services in their clouds:

Each of the above are cloud vendors hoping to cash in by charging for processing of other people’s data.

Benedict Evans, the resident futurist at venture capital firm Andreessen Horowitz, observes in a recent blog post that the future of AI remains opaque: “This field is moving so fast that it’s not easy to say where the strongest leads necessarily are, nor to work out which things will be commodities and which will be strong points of difference,”

Other companies

Algorithmia.com provide API interfaces to algorithms offered by its partners.

awesome-machine-learning provides many links to resources, so they will not be repeated here.


Arxiv Paper Analysis Worksheet (Responses) on Google Sheet


VIDEO: Hands-On with Azure Machine Learning

In 2014, Microsoft showed off its facial recognition capabilities with how-old.net”>how-old.net to guess how old someone is. At conferences they built a booth that takes a picture.

In 2015, Microsoft unleashed the Tay chat bot.

Algorithms from Azure

Below are various initiatives by MS (Microsoft) and other organizations:

Category Algorithm MS
Statistical Functions Descriptive Statistics (Summarize Data) Y
Hypothesis Testing Y
Compute T-Test Linear Correlation Y
Evaluate Probability Function Evaluation Y
Recommendation (collaborative filtering) Train Matchbox Recommender Y
Score Matchbox Recommender Y
Evaluate Recommender Y
Regression Bayesian Linear Regression Y
Boosted Decision Tree Y
Tree Decision Forest Y
Fast Forest Quantile Regression Y
Linear Regression Y
Neural Network Regression Y
Ordinal Regression Y
Poisson Regression Y
Clustering K-means Clustering Y
Anomaly Detection One-class Support Vector Machine Y
Principal Component Analysis-based Anomaly Detection Y
Time Series Anomaly Detection Y
Averaged Perceptron Y
Bayes Point Machine Y
Boosted Decision Tree Y
Decision Forest Y
Decision Jungle Y
Logistic Regression Y
Neural Network Y
Support Vector Machine Y
Multi-class Classification Tune Model Hyperparameters Y
Decision Forest Y
Decision Jungle Y
Logistic Regression Y
Neural Network Y
One-vs-all Y
Text Analytics Feature Hashing Y
Named Entity Recognition Vowpal Wabbit (v8) Y
Computer Vision OpenCV Library Y
Voice Text to Speech Cortana
Translation Language Translation Cortana

A-Z List of Machine Learning Studio Modules from Microsoft Azure lists basic database and UI features such as forms, which means it’s building standard computing functions on top of AI capabilities.

Machine Learning Algorthms - Part 1



Some utilities may involve conventional lookups of data:

  • https://algorithmia.com/algorithms/alixaxel/CoordinatesToTimezone

  • https://algorithmia.com/algorithms/Geo/ZipData

  • https://algorithmia.com/algorithms/Geo/ZipToState

  • https://algorithmia.com/algorithms/Geo/LatLongDistance

  • https://algorithmia.com/algorithms/Geo/LatLongToUTM

  • https://algorithmia.com/algorithms/util/ip2hostname

  • https://algorithmia.com/algorithms/opencv/ChangeImageFormat (from jpg to png)


  • Google Translate API has been working on websites for years.

Image Recognition / Computer Vision

  • https://algorithmia.com/algorithms/z/ColorPalettefromImage

  • Google Cloud Vision API

  • https://algorithmia.com/algorithms/opencv/FaceDetection then https://algorithmia.com/algorithms/opencv/CensorFace

  • https://algorithmia.com/algorithms/ocr/RecognizeCharacters OCR

Some of these make use of OpenCV (CV = Computer Vision).

Voice Recognition

Speech to Text

NLP Sentiment Analysis

Analyze text for positive or negative sentiment (opinion), based on a training database of potential word meanings, which involved Natural Language Processing:

  • https://algorithmia.com/algorithms/nlp/SentimentAnalysis

  • IBM’s algorithm

Andrew W. Trask, PhD student at University of Oxford Deep Learning for Natural Language Processing authored Grokking Deep Learning.

Use Bag of words and Word2vec transform words into vectors. Use TFLearn, a Python library for quickly building networks.

Document (article) Search

TF-IDF = Term Frequency - Inverse Document Frequency emphasizes important words (called a vector) which appear rarely in the corpus searched (rare globally). which appear frequently in document (common locally) Term frequency is measured by word count (how many occurances of each word).

The IDF to downweight words is the log of #docs divided by 1 + #docs using given word.

Cosine similarity normalizes vectors so small angle thetas identify similarity.

Normalizing makes the comparison invariant to the number of words. The common compromise is to cap maximum word count.

Microsoft Azure Machine Learning

https://azure.microsoft.com/en-us/services/machine-learning offers free plans

  • Guest Workspace for 8 hours on https://studio.azureml.net/Home/ViewWorkspaceCached/…

  • Registered free workspaces with 10 GB storage can scale resources to increase experiment execution performance.

All their plans offer:

  • Stock sample datasets
  • R and Python script support
  • Full range of ML alogorithms
  • Predictive web services

Follow this machine learning tutorial to use Azure Machine Learning Studio to create a linear regression model that predicts the price of an automobile based on different variables such as make and technical specifications. Then iterate on a simple predictive analytics experiment after

  1. Enter Microsoft’s Learning Studio:

    As per this video:


  2. Look at examples in the Cortana Intelligence Gallery

  3. Take the introductory tutorial:

    Introduction to Machine Learning with Hands-On Labs

  4. Create a model

  5. Prepare Data:

    As per this video using

    • Clean Missing Data - Clip Outliers
    • Edit Metadata
    • Feature Selection
    • Filter
    • Learning with Counts
    • Normalize Data
    • Partition and Sample
    • Principal Component Analysis
    • Quantize Data
    • SQLite Transformation
    • Synthetic Minority Oversampling Technique
  6. Train the model

    • Cross Validation
    • Retraining
    • Parameter Sweep
  7. Score and test the model

  8. Make predictions with Elastic APIs

    • Request-Response Service (RRS) Predictive Experiment - Batch Execution Service (BES)
    • Retraining API




Python 3.6 has formatted strings


Conda is similar to virtualenv and pyenv, other popular environment managers.




conda install numpy pandas matplotlib

conda install jupyter notebook

  1. List the packages installed, with its version number and what version of Python:

    conda list

Conda Environments

  1. Create new environment for Python, specifying packages needed:

    conda create -n my_env python=3 numpy pandas

  2. Enter an environment on Mac:

    source activate my_env

    On Windows:

    activate my_env

    When you’re in the environment, the environment’s name appears in the prompt:

    (my_env) ~ $.

  3. Leave the environment

    source deactivate

    On Windows, it’s just deactivate.

  4. Get back in again.

  5. Create an enviornment file by piping the output from an export:

    conda env export > some_env.yaml

    When sharing your code on GitHub, it’s good practice to make an environment file and include it in the repository. This will make it easier for people to install all the dependencies for your code. I also usually include a pip requirements.txt file using pip freeze (learn more here) for people not using conda.

  6. Load an environment metadata file:

    conda env create -f some_env.yaml

  7. List environments created on your machine:

    conda env list

  8. Remove an environment:

    conda env remove -n some_env


  • https://conda.io/docs/using/index.html

  • https://jakevdp.github.io/blog/2016/08/25/conda-myths-and-misconceptions/


Jupyter notebooks