Wilson Mar bio photo

Wilson Mar

Hello. Join me!

Email me Calendar Skype call 310 320-7878

LinkedIn Twitter Gitter Google+ Youtube

Github Stackoverflow Pinterest

Tools for data science


Overview

Here is a catalog of what AI and Machine Learning algorithms and Modules offered by Microsoft Azure, Amazon, Google, SAS, MatLab, etc.

These Artificial Intelligence (AI) services make use of algorithms:


Microsoft created a cute interactive museum to view their use cases in a non-technical way at http://azuremlsimpleds.azurewebsites.net/simpleds/ along with PDF: Microsoft’s long infographic about algorithms

This cheat sheet PDF of Microsoft’a Azure ML Algorithms aml-cheatsheet.png
(click for full-screen image)

From the A-Z List of Machine Learning Studio Modules, lists basic database and UI features such as forms, which means it’s building standard computing functions on top of AI capabilities.

Category Algorithm Notes
Statistical Functions Descriptive Statistics (Summarize Data) -
Hypothesis Testing -
Compute T-Test Linear Correlation -
Evaluate Probability Function Evaluation -
Recommendation (collaborative filtering) Train Matchbox Recommender -
Score Matchbox Recommender -
Evaluate Recommender -
Regression Bayesian Linear Regression -
Boosted Decision Tree -
Tree Decision Forest -
Fast Forest Quantile Regression -
Linear Regression -
Neural Network Regression -
Ordinal Regression -
Poisson Regression -
Clustering K-means Clustering -
Anomaly Detection One-class Support Vector Machine -
Principal Component Analysis-based Anomaly Detection -
Time Series Anomaly Detection -
Two-class
Classification
Averaged Perceptron -
Bayes Point Machine -
Boosted Decision Tree -
Decision Forest -
Decision Jungle -
Logistic Regression -
Neural Network -
Support Vector Machine -
Multi-class
Classification
Tune Model Hyperparameters -
Decision Forest -
Decision Jungle -
Logistic Regression -
Neural Network -
One-vs-all -
Text Analytics Feature Hashing -
Named Entity Recognition Vowpal Wabbit (v8) JohnLangford
Sentiment analysis ?
Bot (chat) Framework</a> ?

SAS machine learning algorithms explains this diagram of their algorithms:

machine-learning-algorithms-sas-640x580-116018.jpg
(click image for full screen pop-up)

Translation

https://translate.google.com and the Google Translate API has been working on translating websites since the 90’s. In 2017 Google made a breakthrough

Microsoft’s Translator Speech

Computer Vision

Open-source OpenCV (Computer Vision) was an early entrant and is still used today by many because it is written in C and runs quite efficiently.

Microsoft’s Computer Vision

https://docs.microsoft.com/en-us/azure/cognitive-services/custom-vision-service/getting-started-build-a-classifier Hands-on guide: build a classifier with Custom Vision

Microsoft’s “Face”

  • https://algorithmia.com/algorithms/z/ColorPalettefromImage

  • Google Cloud Vision API

  • https://algorithmia.com/algorithms/opencv/FaceDetection then https://algorithmia.com/algorithms/opencv/CensorFace

  • https://algorithmia.com/algorithms/ocr/RecognizeCharacters OCR

Some of these make use of OpenCV (CV = Computer Vision).

Voice Recognition

Microsoft’s Web App Bot

NLP Sentiment Analysis

Analyze text for positive or negative sentiment (opinion), based on a training database of potential word meanings, which involved Natural Language Processing:

  • https://algorithmia.com/algorithms/nlp/SentimentAnalysis

  • IBM’s algorithm

Andrew W. Trask, PhD student at University of Oxford Deep Learning for Natural Language Processing authored Grokking Deep Learning.

Use Bag of words and Word2vec transform words into vectors. Use TFLearn, a Python library for quickly building networks.

Document (article) Search

Google made it’s fortune on offering search services.

Microsoft’s Bing Search

TF-IDF = Term Frequency - Inverse Document Frequency emphasizes important words (called a vector) which appear rarely in the corpus searched (rare globally). which appear frequently in document (common locally) Term frequency is measured by word count (how many occurances of each word).

The IDF to downweight words is the log of #docs divided by 1 + #docs using given word.

Cosine similarity normalizes vectors so small angle thetas identify similarity.

Normalizing makes the comparison invariant to the number of words. The common compromise is to cap maximum word count.

CNTK 106: Part B - Time series prediction with LSTM (IOT Data)

More

This is one of a series on AI, Machine Learning, Deep Learning, Robotics, and Analytics:

  1. AI Ecosystem
  2. Machine Learning
  3. Testing AI

  4. Microsoft’s AI
  5. Microsoft’s Azure Machine Learning Algorithms
  6. Microsoft’s Azure Machine Learning tutorial
  7. Microsoft’s Azure Machine Learning certification

  8. Python installation
  9. Juypter notebooks processing Python for humans

  10. Image Processing
  11. Amazon Lex text to speech

  12. Code Generation

  13. Multiple Regression calculation and visualization using Excel and Machine Learning
  14. Tableau Data Visualization