Machine Learning Algorithms

Tools for data science

Overview

Translation
Computer Vision
Voice Recognition
NLP Sentiment Analysis
Document (article) Search
More

Here is a catalog of what AI and Machine Learning algorithms and Modules offered by Microsoft Azure, Amazon, Google, SAS, MatLab, etc.

Anomaly Detection to identify and predict rare or unusual data points.
Clustering to discover structure, separate similar data points into intuitive groups.
Regression to predict values (forecast the future by estimating the relationship between variables)
Two-class Classification to answer simple two-choice questions like yes-no or true-false.
Multi-class Classification to answer complex questions with multiple possible answers
(descriptive) Statistical Functions
Recommendation (collaborative filtering)
Sentiment Analysis

These Artificial Intelligence (AI) services make use of algorithms:

Microsoft created a cute interactive museum to view their use cases in a non-technical way at http://azuremlsimpleds.azurewebsites.net/simpleds/ along with PDF: Microsoft’s long infographic about algorithms

This cheat sheet PDF of Microsoft’a Azure ML Algorithms
(click for full-screen image)

From the A-Z List of Machine Learning Studio Modules, lists basic database and UI features such as forms, which means it’s building standard computing functions on top of AI capabilities.

Category	Algorithm	Notes
Statistical Functions	Descriptive Statistics (Summarize Data)	-
	Hypothesis Testing	-
	Compute T-Test Linear Correlation	-
	Evaluate Probability Function Evaluation	-
Recommendation (collaborative filtering)	Train Matchbox Recommender	-
	Score Matchbox Recommender	-
	Evaluate Recommender	-
Regression	Bayesian Linear Regression	-
	Boosted Decision Tree	-
	Tree Decision Forest	-
	Fast Forest Quantile Regression	-
	Linear Regression	-
	Neural Network Regression	-
	Ordinal Regression	-
	Poisson Regression	-
Clustering	K-means Clustering	-
Anomaly Detection	One-class Support Vector Machine	-
	Principal Component Analysis-based Anomaly Detection	-
	Time Series Anomaly Detection	-
Two-class Classification	Averaged Perceptron	-
	Bayes Point Machine	-
	Boosted Decision Tree	-
	Decision Forest	-
	Decision Jungle	-
	Logistic Regression	-
	Neural Network	-
	Support Vector Machine	-
Multi-class Classification	Tune Model Hyperparameters	-
	Decision Forest	-
	Decision Jungle	-
	Logistic Regression	-
	Neural Network	-
	One-vs-all	-
Text Analytics	Feature Hashing	-
	Named Entity Recognition Vowpal Wabbit (v8)	JohnLangford
	Sentiment analysis	?
	Bot (chat) Framework</a>	?

SAS machine learning algorithms explains this diagram of their algorithms:

(click image for full screen pop-up)

Translation

https://translate.google.com and the Google Translate API has been working on translating websites since the 90’s. In 2017 Google made a breakthrough

Microsoft’s Translator Speech

Computer Vision

Open-source OpenCV (Computer Vision) was an early entrant and is still used today by many because it is written in C and runs quite efficiently.

Microsoft’s Computer Vision

https://docs.microsoft.com/en-us/azure/cognitive-services/custom-vision-service/getting-started-build-a-classifier Hands-on guide: build a classifier with Custom Vision

Microsoft’s “Face”

https://algorithmia.com/algorithms/z/ColorPalettefromImage
Google Cloud Vision API
https://algorithmia.com/algorithms/opencv/FaceDetection then https://algorithmia.com/algorithms/opencv/CensorFace
https://algorithmia.com/algorithms/ocr/RecognizeCharacters OCR

Some of these make use of OpenCV (CV = Computer Vision).

Voice Recognition

Google Cloud Speech API, which powers Google’s own voice search and voice-enabled apps.
Microsoft says its Cortana is as accurate as human transcriptionists

Microsoft’s Web App Bot

NLP Sentiment Analysis

Analyze text for positive or negative sentiment (opinion), based on a training database of potential word meanings, which involved Natural Language Processing:

https://algorithmia.com/algorithms/nlp/SentimentAnalysis
IBM’s algorithm

Andrew W. Trask, PhD student at University of Oxford Deep Learning for Natural Language Processing authored Grokking Deep Learning.

Use Bag of words and Word2vec transform words into vectors. Use TFLearn, a Python library for quickly building networks.

Document (article) Search

Google made it’s fortune on offering search services.

Microsoft’s Bing Search

TF-IDF = Term Frequency - Inverse Document Frequency emphasizes important words (called a vector) which appear rarely in the corpus searched (rare globally). which appear frequently in document (common locally) Term frequency is measured by word count (how many occurances of each word).

The IDF to downweight words is the log of #docs divided by 1 + #docs using given word.

Cosine similarity normalizes vectors so small angle thetas identify similarity.

Normalizing makes the comparison invariant to the number of words. The common compromise is to cap maximum word count.

CNTK 106: Part B - Time series prediction with LSTM (IOT Data)

This is one of a series on AI, Machine Learning, Deep Learning, Robotics, and Analytics:

Wilson Mar