Wilson Mar bio photo

Wilson Mar

Hello!

Email me Calendar Skype call

LinkedIn Twitter Gitter Instagram Youtube

Github Stackoverflow Pinterest

Here’s a way to achieve WTF (What the Face) - the unexpected

US (English)   Español (Spanish)   Français (French)   Deutsch (German)   Italiano   Português   Estonian   اَلْعَرَبِيَّةُ (Egypt Arabic)   中文 (简体) Chinese (Simplified)   日本語 Japanese   한국어 Korean

Overview

Here are notes from my research on detecting anomalies.

What is it?

Anomaly detection is considered one of the Machine Learning algorithms

Unlike statistical regression, anomaly detection can fill in missing data in sets.

Typical examples of anomaly detection tasks are detecting credit card fraud, medical problems, or errors in text.

Types of anomalies

Anomalies are also referred to as outliers, novelties, noise, deviations and exceptions.

A point anomaly is an observation this is unusual when compared with all the rest of available observations. For example: An outlier.

A contexual anomaly is an observation that is unusual in a certain context but not in other contexts. For example: seasonality.

A collective anomaly occurs when a collection of related data instances is anomalous (not normal) with respect to the entire data set. For example: regression formula

To distinguish between data classes as normal versus “risky”, we compare the anomaly detection algorithms:

Microsoft Azure

Let’s take a hands-on approach to predict credit risk as anomalies within German Credit data:

https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/apps-anomaly-detection-api dives in.

https://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data

  1. Go to webpage https://gallery.azure.ai/Experiment/1219e87f8fb84e88a2e1b54256808bb3 “Anomaly Detection: Credit Risk” dated September 2, 2014.

  2. Anomaly Detection ML example experiment by Laploy V. Angkul laploy@gmail.com https://gallery.azure.ai/Experiment/Anomaly-Detection-9

  3. Click “Open in Studio”.

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/pca-based-anomaly-detection

https://gallery.azure.ai/Experiment/Anomaly-Detection-Credit-Risk-21

https://gallery.azure.ai/Experiment/Anomaly-Detection-Credit-Risk-5

https://gallery.azure.ai/MachineLearningAPI/Anomaly-Detection-2

Other Tools

Cortical

http://www.cortical.io developed a cortical engine for processing text. “fingerprint”

HTM from Numata

HTM stands for Hierarchical Temporal Memory. The “Temporal” means the time dimension is added. Time based inference (TBI) lessens the impact of noise on accuracy.

Numenta

The book “On Intelligence” written by Jeff Hawkins (founder of Palm and Handspring) published in 2005 talks about a Cortical Learning Algorithm (CLA), which since 2010 is called Hierarchical Temporal Memory (HTM).

Numata’s biological and machine intelligence Its v1.7 achieved 98.4% accuracy on the MNIST dataset.

YouTube: Visualization of HTM processing

https://numenta.com/htm-studio makes use of HTM.

  • https://github.com/numenta/htmpapers
  • https://www.businesswire.com/news/home/20160627005453/en/Numenta-Releases-HTM-Studio
  • The Numenta Anomaly Benchmark (NAB) at https://github.com/numata/nab measures performance running HTM and Etsy’s Skyline algorithms

Numenta created NuPIC (Platform for Intelligent Computing)

Neurons use most of their synapses to make predictions.

Supervised deep learning CNN is limited because it is based on many training examples.

Sensor streams are often seen in massive volumes and high velocities, which leaves little room for human intervention, parameter tweaking or data labeling (training).

sensorimotor theory

Time-Adjacency Matrix

Tutorials on HTM:

Among Numata’s white papers is real-time anomaly detection for streaming data.

Sparse

The Eigen vectorized math library makes HTM code efficient, and can be GPU-accelerated as well.

CLA made use of Sparse Distributed Representations (SDR) Very large matrices where only a few coefficients are different from zero. In such cases, memory consumption can be reduced and performance increased by using a specialized representation storing only the nonzero coefficients. Such a matrix is called a sparse matrix. It’s useful for Continuous learning.

Two sets of SDR matrices can have an overlap or a union. A theta score defines the threshold whether two matrics match.

But the presence of noise can obfuscate the match.

TOOL: SDR Matching presents a GUI to explore the relationship between

PROTIP: SDK is rather resistant to noise.

Episode 5 shows the SDR Scalar Encoder.

Generation 3 in 2014

Fortune article

HTM

HTM learns time-based patterns in unlabeled streaming data.

HTM is modeled based on the workings of most advanced part of the brain – the Neocortex “white matter” where memories and personality are stored. This field is called Neuroscience.

The Laminar circuit within the cortex is emulated by a common cortical algorithm which describes how all cortical regions and all sensory-motor modalities work. In layers:

common-cortical-algorithm.png

The top of the hierarchy is the frontal cortex, which passes commands down to lower levels controlling muscles.

Hawkins founded at U.C. Berkeley the Redwood Neuroscience Institute, a scientific institute focused on understanding how the neocortex processes information.

Videos

Continuous Online Sequence Learning with an Unsupervised Neural Network Model

Numenta Social

Numenta People

  • Donna Dubinsky @ddubinsky CEO
  • P of Research @SubutaiAhmad
  • Christy Maver @christymaver

Theory Videos

InfluxDB

Time series database InfluxDB went through YC 2013.

InfluxDB YouTube channel

InfoDB + Grafana + Kapacitor

https://www.youtube.com/watch?v=86cOdXXvjhA Grafana: Open Source Metrics Dashboard Rackspace Developers 13,870 views 32:28

https://www.youtube.com/watch?v=hAI-qz399EQ InfluxDB Tech Tips - June 2016 InfluxData 543 views

Apache Spark

Datadog

HawkEys

AI

Anomaly Detection ML example experiment to predict credit risk as anomalies within German Credit data

Microsoft’s CNTK

CNTK 106: Part B - Time series prediction with LSTM (IOT Data)

Videos

https://www.youtube.com/watch?v=0PtehdUL-38 Robust anomaly detection for real user monitoring data - Velocity 2016, Santa Clara, CA 17:14 by Ritesh Maheshwari

https://www.youtube.com/watch?v=xvdLX1jvoOI Anomaly detection in R Tukang Leding 76 views

https://www.youtube.com/watch?v=CAvKQHHNmcY Anomaly Detection Algorithms and Techniques for Real-World Detection Systems Next Day Video 1,440 views

https://www.youtube.com/watch?v=Mj1oHwJ7i2o Real-time Anomaly Detection Architecture DATA SCIENCE SUMMIT EUROPE 2016 476 views

https://www.youtube.com/watch?v=5mBiac_dhbs “Data-driven Anomaly Detection” | Talks at Google Talks by Nikunj Oza of NASA Aames Lab at Google 5,376 views

https://www.youtube.com/watch?v=ILq-3z5Plck AppliedAI #1 - From anomaly detection to deep learning Ravelin

https://www.youtube.com/watch?v=0PqzukqMcdA 24:40 Machine Learning for Real-Time Anomaly Detection in Network Time-Series Data - Jaeseong Jeong RISE SICS

https://gallery.azure.ai/browse/?algorithms=[%22One-Class%20Support%20Vector%20Machine%22] lists a gallery of links about Anomaly Detection.

More

This is one of a series on AI, Machine Learning, Deep Learning, Robotics, and Analytics:

  1. AI Ecosystem
  2. Machine Learning
  3. Testing AI

  4. Microsoft’s AI
  5. Microsoft’s Azure Machine Learning Algorithms
  6. Microsoft’s Azure Machine Learning tutorial
  7. Microsoft’s Azure Machine Learning certification

  8. Python installation
  9. Juypter notebooks processing Python for humans

  10. Image Processing
  11. Tessaract OCR using OpenCV
  12. Amazon Lex text to speech

  13. Code Generation

  14. Multiple Regression calculation and visualization using Excel and Machine Learning
  15. Tableau Data Visualization