Elastic Query

How to get wisdom from data

Overview

Architecture
Data structure
Mavel GUI executes queries
- Verify
API
Geo queries
Machine Learning
References
More

This page contains unorganized notes about queries from the Elasticsearch datastore, a part of the Elastic Stack.

This is the heart of the value from the Elastic Stack.

Elastic “democratizes” data by putting a front-end to access data in a searcheable in fast, meaningful ways.

Architecture

Elasticsearch indexes (inverted) nested aggregations of data in Hadoop.
Curator at https://github.com/elasticsearch/curator manages Elasticsearch indexes by enabling admins to schedule operations to optimise, close, and delete indexes.
Kibana does data discovery on elasticsearch cluster to identify “actionable insights” and presents visualization (a dashboard).

Unlike SQL databases, ES doesn’t do transactions nor enforce referential integrity. So ES distributes well.

ES is great at faceting (more than one per facet).

ES is a “document store” like MongoDB.

Data structure

At the top level are indicies (indexes).

doctypes have schemas attached to them.

ES can infer data types to fields. PROTIP: Define schemas explicitly to avoid errors in inference.

Create the database: (An example from the Toto tutorial)

CREATE DATABASE `my blog` DEFAULT CHARACTER SET latin1 COLLATE latin1_swedish_ci;

Create table for blog posts:

CREATE TABLE IF NOT EXISTS `post` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `user_id` int(10) NOT NULL,
  `post_text` varchar(255) DEFAULT NULL,
  `post_date` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=3;

4 spaces per indent.

CREATE TABLE IF NOT EXISTS post ( id bigint(20) unsigned NOT NULL AUTO_INCREMENT, user_id int(10) NOT NULL, post_text varchar(255) DEFAULT NULL, post_date datetime NOT NULL, PRIMARY KEY (id) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=3; </pre>

Mavel GUI executes queries

Control+Enter executes

GET /_cat/indicies
POST /my_blog
{
    "mappings": {
        "post": {
            "properties": {
                "user_id": {
                    "type": "integer"
                },
                "post_text": {
                    "type": "string"
                },
                "post_date": {
                    "type": "date"
                }
            }
        }
    }
}

NOTE: “mappings” (schema), type “post” (table), properties (columns):

Alternately:

   
(POST) http://localhost:9200/my_blog

Verify

GET my_blog

API

Geo queries

Polygon containment queries are parallelized.

Machine Learning

Machine Learning is available for “Gold” level subscribers.

at Elastic{ON}17 London 8 March by Steve Dodson, TechLead, and Sophie Chang, Team Lead

References

Elasticsearch (Part 1): Indexing and Querying [31:15] 16 Oct 2015 at PyCon US 2013 from NextDayVideo by Erik Rose (ES Py library maintainer) who was at Mozilla.
Thinking through and debugging problems with your query
Another example of complicated mapping (ngram, synonyms, phonemes)
Searching parts of a word
Fun with ElasticSearch’s children and nested documents

This is one of a series on Elastic Stack and monitoring:

Wilson Mar