Wilson Mar bio photo

Wilson Mar

Hello!

Calendar YouTube Github

LinkedIn

Open-source programmed in Go to be fast and scalable

US (English)   Norsk (Norwegian)   Español (Spanish)   Français (French)   Deutsch (German)   Italiano   Português   Estonian   اَلْعَرَبِيَّةُ (Egypt Arabic)   Napali   中文 (简体) Chinese (Simplified)   日本語 Japanese   한국어 Korean

Overview

This page contains my notes on learning and using InfluxDB for managing and analyzing time-series data.

Advantages

InfluxDB is/has:

https://github.com/mark-rushakoff/awesome-influxdb

Architecture

The components around InfluxDB is calle the “TICK” stack for Telegraf, InfluxDB, Chronograf, and Kapacitor.

tick_stack 650x311

  • Telegraf is a plugin-driven server agent for collecting and reporting metrics. It pulls into one place metrics from StatsD, Redis, Elasticsearch, PostgreSQL, and more.

  • InfluxDB is a time-series database built from the ground up to handle high write and query loads.

  • Chronograf is a graphing and visualization application for performing ad hoc exploration of data.

  • Kapacitor is a data processing framework providing alerting, anomaly detection, and action frameworks.

PROTIP: Instead of Kapacitor, many use Grafana to display dashboards.

Another way of looking at this:

tick plus 800x684.png *

Charts (files that describe a set of Kubernetes resources) read by Helm.

Open Source

The https://www.influxdata.com/products/ platform is open sourced at
https://github.com/influxdata/influxdb

https://github.com/node-influx/node-influx is the source for the Node Client at
https://node-influx.github.io

https://github.com/influxdata/influxdb-python

https://github.com/ziyasal/InfluxDB.Net

https://github.com/influxdata/influxdb-java

Cloud

Where Influx makes money is on their Cloud platform: 14 days free, then https://cloud.influxdata.com/plan-picker on AWS EC2 us-west-2 region

  • $0.35 per hour
  • $2.08 per hour

Sandbox

Get hands-on:

Install Influx CLI

https://github.com/Dieterbe/influx-cli

https://github.com/influxdata/influxdb-ruby

Create Docker

https://www.youtube.com/watch?v=PfsNHGy9EsE

  1. Create Docker volume:

    sudo docker volume create influxdb_data

    The response is the name of the volume created.

  2. Run Docker container:

    docker run -d --name="influxdb" -p 8086:8086 -v influx_data:/var/lib/influxdb \
    influxdb -config /etc/influxdb/ifluxdb.conf \
    --restart on-failure

    This will pull from library/influxdb

    https://www.portainer.io/

  3. Create database:

    docker exec -it influxdb /bin/bash

    This returns a prompt for the Docker container instance localhost:8086

  4. Create a database:

    CREATE DATABASE home_assistant
    exit
  5. Add component to HA by editing the config using the vi editor:

    vi configuration.yaml
  6. Insert

    influxdb:
    host: 10.10.10.20
    include:
       domains: 
         - sensor
    

Dashboard

https://github.com/CymaticLabs/InfluxDBStudio

https://github.com/anryko/grafana-influx-dashboard

Ingest

Line Protocol

Influx DB does not use JSON, but a “line protocol” of its own design.

Numbers are assumed to be floating point by default.

Integer values must have an “i” (such as value=12i).

Since the Influx database is built for time series data,
Date values do not have a label. DateTime stamps are 19 digits consisting of the number of seconds since the 1970 Epoch, plus 11 more numbers for a precise nanosecond, as in “1445299200000000000”.

QUESTION: Sample code and libraries for clients communicating with the database.

Data Import

Tick script:

curl https://s3-us-west-2.amazonaws.com/influx-sample-data/NOAA.txt > NOAA_data.txt
influx -import -path=NOAA_data.txt -precision=s
influx
USE NOAA_water_database
precision rfc3339
SHOW SERIES
SHOW FIELD KEYS  # data types
   

Query

The InfluxDB CLI works with a SQL-like language.

To start


   influx
   CREATE DATABASE mydb
   SHOW DATBASES
   USE mydb
   SELECT percentile(busy,90) FROM cpu WHERE time > now() - 1h
   SELECT MEAN(busy) FROM ... GROUP BY location
   SELECT * FROM h2o_quality GROUP by time(10m) ORDER BY time DESC LIMIT 4 WHERE time > now() - 1h
   

Kapacitor recognizes anomalies

https://github.com/poxet/Influx-Capacitor

// detect system down
var period = 2m
var every = 30s
 
// select the stream
var sys_data = stream
   |from ()
       .database('telegraf')
       .measurement('system')
       .where(lambda): "host" =~ /tot.*/ OR "host" =~ ~/prod.*/
       .groupBy('host','cluster_id')
   |window()
       .period(period)
       .every(every)
   

from [1]

Benchmarks

  • https://docs.influxdata.com/influxdb/v1.2/guides/hardware_sizing/
  • https://github.com/influxdata/influxdb-comparisons

InfluxDB created their own Storage Engine to define sharding distributing read/writes on several disks, making sure. Indexing. Examples include InnoDB, MyISAM, Falcon, XtraDB (MySQL, etc.)

Influx claims “350,000 writes per second on commodity hardware”.

Benchmarking InfluxDB Storage Engines: v0.10, v0.9,and v0.8 by Todd Persen, VP of Engineering January 28, 2016

Time-Structured Merge (TSM) Tree

Built-in WAL with queryable in-memory cache

TODO: Compare against ELK stack and NoSQL databases.

Transformer functions

The Influx server is written the Go programming lanuage.

Integrates with TensorFlow

See video “How to Manage TensorFlow with InfluxData” (The TensorFlow Jupyter notebook for weather prediction is shown from 17:24)

Social

Meet People

Paul Dix, CTO of InfluxData (NYC resident)

Nathaniel Cook:

GrafanaCon 2017

Open Source Summit North America 2017 Monday, September 11, 2017 - Thursday, September 14, 2017

Addtitional information:

  • https://gist.github.com/travisjeffery/43f424fbd7ac677adbba304cef6eb58f

  • https://github.com/influxdata/influxdb-testing

Resources

https://www.spectory.com/blog/System%20monitoring%20with%20InfluxDB%20vs%20Elasticsearch

https://www.youtube.com/watch?v=PfsNHGy9EsE Installing InfluxDB in Docker