Wilson Mar bio photo

Wilson Mar

Hello. Hire me!

Email me Calendar Skype call 310 320-7878

LinkedIn Twitter Gitter Google+ Youtube

Github Stackoverflow Pinterest

How to get wisdom from data


Overview

This page contains unorganized notes about queries from the Elasticsearch datastore, a part of the Elastic Stack.

This is the heart of the value from the Elastic Stack.

Elastic “democratizes” data by putting a front-end to access data in a searcheable in fast, meaningful ways.

Architecture

  1. Elasticsearch indexes (inverted) nested aggregations of data in Hadoop.

  2. Curator at https://github.com/elasticsearch/curator manages Elasticsearch indexes by enabling admins to schedule operations to optimise, close, and delete indexes.

  3. Kibana does data discovery on elasticsearch cluster to identify “actionable insights” and presents visualization (a dashboard).

Unlike SQL databases, ES doesn’t do transactions nor enforce referential integrity. So ES distributes well.

ES is great at faceting (more than one per facet).

ES is a “document store” like MongoDB.

Data structure

At the top level are indicies (indexes).

doctypes have schemas attached to them.

ES can infer data types to fields. PROTIP: Define schemas explicitly to avoid errors in inference.

Create the database: (An example from the Toto tutorial)

CREATE DATABASE `my blog` DEFAULT CHARACTER SET latin1 COLLATE latin1_swedish_ci;
   

Create table for blog posts:

CREATE TABLE IF NOT EXISTS `post` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `user_id` int(10) NOT NULL,
  `post_text` varchar(255) DEFAULT NULL,
  `post_date` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=3;
   

4 spaces per indent.

CREATE TABLE IF NOT EXISTS post ( id bigint(20) unsigned NOT NULL AUTO_INCREMENT, user_id int(10) NOT NULL, post_text varchar(255) DEFAULT NULL, post_date datetime NOT NULL, PRIMARY KEY (id) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=3; </pre>

Mavel GUI executes queries

Control+Enter executes

GET /_cat/indicies
POST /my_blog
{
    "mappings": {
        "post": {
            "properties": {
                "user_id": {
                    "type": "integer"
                },
                "post_text": {
                    "type": "string"
                },
                "post_date": {
                    "type": "date"
                }
            }
        }
    }
}
   

NOTE: “mappings” (schema), type “post” (table), properties (columns):

Alternately:

   
(POST) http://localhost:9200/my_blog
   

Verify

GET my_blog

   

API

Geo queries

Polygon containment queries are parallelized.

Machine Learning

Machine Learning is available for “Gold” level subscribers.

at Elastic{ON}17 London 8 March by Steve Dodson, TechLead, and Sophie Chang, Team Lead

References

More

This is one of a series on Elastic Stack and monitoring:

  1. Elastic Stack ecosystem of people, websites, tutorials
  2. Elastic Stack architecture and installation
  3. Elastic Scaling (the database engine)
  4. Elastic Query (via REST API)
  5. Elastic Kibana (the visualization engine, like Grafana)
  6. Elastic Logstash to assemble and filter data from Beats
  7. Elastic Beats to collect data from servers