InfluxDB time-series SQL-like database

Open-source programmed in Go to be fast and scalable

Overview

Advantages
Architecture
Open Source
Cloud
Sandbox
Install Influx CLI
Create Docker
- Dashboard
Ingest
- Line Protocol
- Data Import
Query
Kapacitor recognizes anomalies
Benchmarks
Transformer functions
Integrates with TensorFlow
Social
Meet People
Resources

This page contains my notes on learning and using InfluxDB for managing and analyzing time-series data.

Advantages

InfluxDB is/has:

Open-source (MIT)
No external dependencies (written in Go)
SQL-like query language
Input data “Line Format” (not exactly JSON)
Stores data in compressed format (to save space)
Horizontally scaleable (across several servers)
Kapacitor component recognizes anomalies
Integrates with TensorFlow

https://github.com/mark-rushakoff/awesome-influxdb

Architecture

The components around InfluxDB is calle the “TICK” stack for Telegraf, InfluxDB, Chronograf, and Kapacitor.

Telegraf is a plugin-driven server agent for collecting and reporting metrics. It pulls into one place metrics from StatsD, Redis, Elasticsearch, PostgreSQL, and more.
InfluxDB is a time-series database built from the ground up to handle high write and query loads.
Chronograf is a graphing and visualization application for performing ad hoc exploration of data.
Kapacitor is a data processing framework providing alerting, anomaly detection, and action frameworks.

PROTIP: Instead of Kapacitor, many use Grafana to display dashboards.

Another way of looking at this:

Charts (files that describe a set of Kubernetes resources) read by Helm.

Open Source

The https://www.influxdata.com/products/ platform is open sourced at
https://github.com/influxdata/influxdb

https://github.com/node-influx/node-influx is the source for the Node Client at
https://node-influx.github.io

https://github.com/influxdata/influxdb-python

https://github.com/ziyasal/InfluxDB.Net

https://github.com/influxdata/influxdb-java

Cloud

Where Influx makes money is on their Cloud platform: 14 days free, then https://cloud.influxdata.com/plan-picker on AWS EC2 us-west-2 region

$0.35 per hour
$2.08 per hour

Sandbox

Get hands-on:

https://github.com/influxdata/TICK-docker/blob/master/README.md
https://github.com/influxdata/sandbox
Tutorial to run on Digital Ocean’s cloud using Centos7

Install Influx CLI

https://github.com/Dieterbe/influx-cli

https://github.com/influxdata/influxdb-ruby

Create Docker

https://www.youtube.com/watch?v=PfsNHGy9EsE

Create Docker volume:
```
sudo docker volume create influxdb_data
```
The response is the name of the volume created.

Run Docker container:

docker run -d --name="influxdb" -p 8086:8086 -v influx_data:/var/lib/influxdb \
influxdb -config /etc/influxdb/ifluxdb.conf \
--restart on-failure

This will pull from library/influxdb

https://www.portainer.io/

Create database:
```
docker exec -it influxdb /bin/bash
```
This returns a prompt for the Docker container instance localhost:8086
Create a database:
```
CREATE DATABASE home_assistant
exit
```
Add component to HA by editing the config using the vi editor:
```
vi configuration.yaml
```

Insert

influxdb:
host: 10.10.10.20
include:
   domains: 
     - sensor

Dashboard

https://github.com/CymaticLabs/InfluxDBStudio

https://github.com/anryko/grafana-influx-dashboard

Ingest

Line Protocol

Influx DB does not use JSON, but a “line protocol” of its own design.

Numbers are assumed to be floating point by default.

Integer values must have an “i” (such as value=12i).

Since the Influx database is built for time series data,
Date values do not have a label. DateTime stamps are 19 digits consisting of the number of seconds since the 1970 Epoch, plus 11 more numbers for a precise nanosecond, as in “1445299200000000000”.

QUESTION: Sample code and libraries for clients communicating with the database.

Data Import

Tick script:

curl https://s3-us-west-2.amazonaws.com/influx-sample-data/NOAA.txt > NOAA_data.txt
influx -import -path=NOAA_data.txt -precision=s
influx
USE NOAA_water_database
precision rfc3339
SHOW SERIES
SHOW FIELD KEYS  # data types

Query

The InfluxDB CLI works with a SQL-like language.

To start


   influx
   CREATE DATABASE mydb
   SHOW DATBASES
   USE mydb
   SELECT percentile(busy,90) FROM cpu WHERE time > now() - 1h
   SELECT MEAN(busy) FROM ... GROUP BY location
   SELECT * FROM h2o_quality GROUP by time(10m) ORDER BY time DESC LIMIT 4 WHERE time > now() - 1h

Kapacitor recognizes anomalies

https://github.com/poxet/Influx-Capacitor

// detect system down
var period = 2m
var every = 30s
 
// select the stream
var sys_data = stream
   |from ()
       .database('telegraf')
       .measurement('system')
       .where(lambda): "host" =~ /tot.*/ OR "host" =~ ~/prod.*/
       .groupBy('host','cluster_id')
   |window()
       .period(period)
       .every(every)

from [1]

Benchmarks

https://docs.influxdata.com/influxdb/v1.2/guides/hardware_sizing/
https://github.com/influxdata/influxdb-comparisons

InfluxDB created their own Storage Engine to define sharding distributing read/writes on several disks, making sure. Indexing. Examples include InnoDB, MyISAM, Falcon, XtraDB (MySQL, etc.)

Influx claims “350,000 writes per second on commodity hardware”.

Benchmarking InfluxDB Storage Engines: v0.10, v0.9,and v0.8 by Todd Persen, VP of Engineering January 28, 2016

Time-Structured Merge (TSM) Tree

Built-in WAL with queryable in-memory cache

TODO: Compare against ELK stack and NoSQL databases.

Transformer functions

The Influx server is written the Go programming lanuage.

Integrates with TensorFlow

See video “How to Manage TensorFlow with InfluxData” (The TensorFlow Jupyter notebook for weather prediction is shown from 17:24)

https://www.influxdata.com
#influxdb #influxdata on Twitter ???
@influxdb on Twitter
@influxdbNews on Twitter
@hostedinflux on Twitter
LinkedIn group of InfluxData Developers
LinkedIn company profile
YouTube channel has no videos. Related: Grafana
LinkedIn group
Meetup.com - none
Google+
Google Groups for developers
Instagram
100% recommendation from 2 employees on Glassdoor
Paysa

Meet People

Paul Dix, CTO of InfluxData (NYC resident)

Monitoring InfluxCloud with InfluxDB [28:06] at GrafanaCon 1 Dec 2016 (1)
Metric-driven development with Ansible, InfluxDB, and Grafana at Red Hat Summit
Introduction to InfluxDB [53:08] at Hakka Labs
InfluxDB Storage Engine Internals [43:42] at Hakka Labs

Nathaniel Cook:

Watch Everything, Watch Anything: Anomaly Detection at Salt Lake City DevOps Days

GrafanaCon 2017

Open Source Summit North America 2017 Monday, September 11, 2017 - Thursday, September 14, 2017

Time-Series Dashboards with Grafana and Influx DB September 2016 says he uses Grafana because he didn’t need text search. Grafana does not support Mongo so I had to bring in Influx DB to integrate with Grafana.

Addtitional information:

https://gist.github.com/travisjeffery/43f424fbd7ac677adbba304cef6eb58f
https://github.com/influxdata/influxdb-testing

Resources

https://www.spectory.com/blog/System%20monitoring%20with%20InfluxDB%20vs%20Elasticsearch

https://www.youtube.com/watch?v=PfsNHGy9EsE Installing InfluxDB in Docker

Wilson Mar