Here’s a way to achieve WTF (What the Face) - the unexpected
Overview
Here are notes from my research on detecting anomalies.
What is it?
Anomaly detection is considered one of the Machine Learning algorithms
Unlike statistical regression, anomaly detection can fill in missing data in sets.
Typical examples of anomaly detection tasks are detecting credit card fraud, medical problems, or errors in text.
Types of anomalies
Anomalies are also referred to as outliers, novelties, noise, deviations and exceptions.
A point anomaly is an observation this is unusual when compared with all the rest of available observations. For example: An outlier.
A contexual anomaly is an observation that is unusual in a certain context but not in other contexts. For example: seasonality.
A collective anomaly occurs when a collection of related data instances is anomalous (not normal) with respect to the entire data set. For example: regression formula
To distinguish between data classes as normal versus “risky”, we compare the anomaly detection algorithms:
-
Time Series Anomaly Detection
Microsoft Azure
Let’s take a hands-on approach to predict credit risk as anomalies within German Credit data:
https://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data
-
Go to webpage https://gallery.azure.ai/Experiment/1219e87f8fb84e88a2e1b54256808bb3 “Anomaly Detection: Credit Risk” dated September 2, 2014.
-
Anomaly Detection ML example experiment by Laploy V. Angkul laploy@gmail.com https://gallery.azure.ai/Experiment/Anomaly-Detection-9
-
Click “Open in Studio”.
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/pca-based-anomaly-detection
https://gallery.azure.ai/Experiment/Anomaly-Detection-Credit-Risk-21
https://gallery.azure.ai/Experiment/Anomaly-Detection-Credit-Risk-5
https://gallery.azure.ai/MachineLearningAPI/Anomaly-Detection-2
Other Tools
Cortical
http://www.cortical.io developed a cortical engine for processing text. “fingerprint”
HTM from Numata
HTM stands for Hierarchical Temporal Memory. The “Temporal” means the time dimension is added. Time based inference (TBI) lessens the impact of noise on accuracy.
Numenta
The book “On Intelligence” written by Jeff Hawkins (founder of Palm and Handspring) published in 2005 talks about a Cortical Learning Algorithm (CLA), which since 2010 is called Hierarchical Temporal Memory (HTM).
Numata’s biological and machine intelligence Its v1.7 achieved 98.4% accuracy on the MNIST dataset.
YouTube: Visualization of HTM processing
https://numenta.com/htm-studio makes use of HTM.
- https://github.com/numenta/htmpapers
- https://www.businesswire.com/news/home/20160627005453/en/Numenta-Releases-HTM-Studio
- The Numenta Anomaly Benchmark (NAB) at https://github.com/numata/nab measures performance running HTM and Etsy’s Skyline algorithms
Numenta created NuPIC (Platform for Intelligent Computing)
Neurons use most of their synapses to make predictions.
Supervised deep learning CNN is limited because it is based on many training examples.
Sensor streams are often seen in massive volumes and high velocities, which leaves little room for human intervention, parameter tweaking or data labeling (training).
sensorimotor theory
Time-Adjacency Matrix
Tutorials on HTM:
-
- <a target=”_blank” href=”https://www.youtube.com/playlist?list=PL3yXMgtrZmDqhsFQzwUC9V8MeeVOQ7eZ9 VIDEO: HTM School by Matt Taylor, Numenta’s Open Source Flag-Bearer
Visualizations shown in HTM School are based on code at
https://github.com/htm-community/htm-school-viz
Among Numata’s white papers is real-time anomaly detection for streaming data.
Sparse
The Eigen vectorized math library makes HTM code efficient, and can be GPU-accelerated as well.
CLA made use of Sparse Distributed Representations (SDR) Very large matrices where only a few coefficients are different from zero. In such cases, memory consumption can be reduced and performance increased by using a specialized representation storing only the nonzero coefficients. Such a matrix is called a sparse matrix. It’s useful for Continuous learning.
Two sets of SDR matrices can have an overlap or a union. A theta score defines the threshold whether two matrics match.
But the presence of noise can obfuscate the match.
TOOL: SDR Matching presents a GUI to explore the relationship between
PROTIP: SDK is rather resistant to noise.
Episode 5 shows the SDR Scalar Encoder.
HTM
HTM learns time-based patterns in unlabeled streaming data.
HTM is modeled based on the workings of most advanced part of the brain – the Neocortex “white matter” where memories and personality are stored. This field is called Neuroscience.
The Laminar circuit within the cortex is emulated by a common cortical algorithm which describes how all cortical regions and all sensory-motor modalities work. In layers:
The top of the hierarchy is the frontal cortex, which passes commands down to lower levels controlling muscles.
Hawkins founded at U.C. Berkeley the Redwood Neuroscience Institute, a scientific institute focused on understanding how the neocortex processes information.
Videos
Continuous Online Sequence Learning with an Unsupervised Neural Network Model
Numenta Social
Numenta People
- Donna Dubinsky @ddubinsky CEO
- P of Research @SubutaiAhmad
- Christy Maver @christymaver
Theory Videos
-
“Real-Time Anomaly Detection on Time-Series IoT Sensor Data Using Deep Learning” [17:13] by Romeo Kienzler of Data Natives
-
Anomaly Detection 101 by Elizabeth (Betsy) Nichols Ph.D. DevOpsDays Silicon Valley 14 Nov 2015
-
Science of Anomaly Detection [17:13] 17 Oct 2014 by Scott Purdy (spurdy@Numenta.com, @scottmpurdy)
InfluxDB
Time series database InfluxDB went through YC 2013.
InfoDB + Grafana + Kapacitor
https://www.youtube.com/watch?v=86cOdXXvjhA Grafana: Open Source Metrics Dashboard Rackspace Developers 13,870 views 32:28
https://www.youtube.com/watch?v=hAI-qz399EQ InfluxDB Tech Tips - June 2016 InfluxData 543 views
-
Introduction to Kapacitor for Alerting and Anomaly Detection at InfluxData
-
Watch Everything, Watch Anything: Anomaly Detection [38:04] 26 Jun 2016 by Nathaniel Cook (@nathanielvcook) of InfluxData at Salt Lake City DevOps Days
-
Paul Dix (CEO): Time Series Data with InfluxDB at Data Science Summit 2015 Turi, Inc.
-
Introduction to InfluxDB by Paul Dix Hakka Labs 8,463 views 53:08
-
Internals and future of InfluxDB by Paul Dix at the DigitalOcean Community Meetup
Apache Spark
-
53:03 Anomaly Detection with Apache Spark - Sean Owen George Agnelli 11,002 views
-
Step by step guide how to build a real-time anomaly detection system using Apache Spark Streaming by Mariusz Jacyno
-
Petabyte Scale Anomaly Detection Using R & Spark Spark Summit
Datadog
- Detecting outliers and anomalies in realtime at Datadog by Homin Lee (OSCON Austin 2016)
HawkEys
- HawkEye: A Real Time Anomaly Detection System by Satnam Singh - HasGeek TV
AI
- Andrew Ng - The State of Artificial Intelligence at MIT EmTech November 7, 2017
Anomaly Detection ML example experiment to predict credit risk as anomalies within German Credit data
Microsoft’s CNTK
CNTK 106: Part B - Time series prediction with LSTM (IOT Data)
Videos
https://www.youtube.com/watch?v=0PtehdUL-38 Robust anomaly detection for real user monitoring data - Velocity 2016, Santa Clara, CA 17:14 by Ritesh Maheshwari
https://www.youtube.com/watch?v=xvdLX1jvoOI Anomaly detection in R Tukang Leding 76 views
https://www.youtube.com/watch?v=CAvKQHHNmcY Anomaly Detection Algorithms and Techniques for Real-World Detection Systems Next Day Video 1,440 views
https://www.youtube.com/watch?v=Mj1oHwJ7i2o Real-time Anomaly Detection Architecture DATA SCIENCE SUMMIT EUROPE 2016 476 views
https://www.youtube.com/watch?v=5mBiac_dhbs “Data-driven Anomaly Detection” | Talks at Google Talks by Nikunj Oza of NASA Aames Lab at Google 5,376 views
https://www.youtube.com/watch?v=ILq-3z5Plck AppliedAI #1 - From anomaly detection to deep learning Ravelin
https://www.youtube.com/watch?v=0PqzukqMcdA 24:40 Machine Learning for Real-Time Anomaly Detection in Network Time-Series Data - Jaeseong Jeong RISE SICS
https://gallery.azure.ai/browse/?algorithms=[%22One-Class%20Support%20Vector%20Machine%22] lists a gallery of links about Anomaly Detection.
More
This is one of a series on AI, Machine Learning, Deep Learning, Robotics, and Analytics:
- AI Ecosystem
- Machine Learning
- Microsoft’s AI
- Microsoft’s Azure Machine Learning Algorithms
- Microsoft’s Azure Machine Learning tutorial
- Python installation
- Image Processing
- Tessaract OCR using OpenCV
- Multiple Regression calculation and visualization using Excel and Machine Learning
- Tableau Data Visualization