How to get wisdom from data
Overview
This page contains unorganized notes about queries from the Elasticsearch datastore, a part of the Elastic Stack.
This is the heart of the value from the Elastic Stack.
Elastic “democratizes” data by putting a front-end to access data in a searcheable in fast, meaningful ways.
Architecture
-
Elasticsearch indexes (inverted) nested aggregations of data in Hadoop.
-
Curator at https://github.com/elasticsearch/curator manages Elasticsearch indexes by enabling admins to schedule operations to optimise, close, and delete indexes.
-
Kibana does data discovery on elasticsearch cluster to identify “actionable insights” and presents visualization (a dashboard).
Unlike SQL databases, ES doesn’t do transactions nor enforce referential integrity. So ES distributes well.
ES is great at faceting (more than one per facet).
ES is a “document store” like MongoDB.
Data structure
At the top level are indicies (indexes).
doctypes have schemas attached to them.
ES can infer data types to fields. PROTIP: Define schemas explicitly to avoid errors in inference.
Create the database: (An example from the Toto tutorial)
CREATE DATABASE `my blog` DEFAULT CHARACTER SET latin1 COLLATE latin1_swedish_ci;
Create table for blog posts:
CREATE TABLE IF NOT EXISTS `post` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `user_id` int(10) NOT NULL, `post_text` varchar(255) DEFAULT NULL, `post_date` datetime NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=3;
4 spaces per indent.
CREATE TABLE IF NOT EXISTS post
(
id
bigint(20) unsigned NOT NULL AUTO_INCREMENT,
user_id
int(10) NOT NULL,
post_text
varchar(255) DEFAULT NULL,
post_date
datetime NOT NULL,
PRIMARY KEY (id
)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=3;
</pre>
Mavel GUI executes queries
Control+Enter executes
GET /_cat/indicies POST /my_blog { "mappings": { "post": { "properties": { "user_id": { "type": "integer" }, "post_text": { "type": "string" }, "post_date": { "type": "date" } } } } }
NOTE: “mappings” (schema), type “post” (table), properties (columns):
Alternately:
(POST) http://localhost:9200/my_blog
Verify
GET my_blog
API
Geo queries
Polygon containment queries are parallelized.
Machine Learning
Machine Learning is available for “Gold” level subscribers.
at Elastic{ON}17 London 8 March by Steve Dodson, TechLead, and Sophie Chang, Team Lead
References
-
Elasticsearch (Part 1): Indexing and Querying [31:15] 16 Oct 2015 at PyCon US 2013 from NextDayVideo by Erik Rose (ES Py library maintainer) who was at Mozilla.
-
Another example of complicated mapping (ngram, synonyms, phonemes)
More
This is one of a series on Elastic Stack and monitoring:
- Elastic Stack ecosystem of people, websites, tutorials
- Elastic Stack architecture and installation
- Elastic Scaling (the database engine)
- Elastic Query (via REST API)
- Elastic Kibana (the visualization engine, like Grafana)
- Elastic Logstash to assemble and filter data from Beats
- Elastic Beats to collect data from servers