Open-source programmed in Go to be fast and scalable
Overview
This page contains my notes on learning and using InfluxDB for managing and analyzing time-series data.
Advantages
InfluxDB is/has:
- Open-source (MIT)
- No external dependencies (written in Go)
- SQL-like query language
- Input data “Line Format” (not exactly JSON)
- Stores data in compressed format (to save space)
- Horizontally scaleable (across several servers)
- Kapacitor component recognizes anomalies
- Integrates with TensorFlow
https://github.com/mark-rushakoff/awesome-influxdb
Architecture
The components around InfluxDB is calle the “TICK” stack for Telegraf, InfluxDB, Chronograf, and Kapacitor.
-
Telegraf is a plugin-driven server agent for collecting and reporting metrics. It pulls into one place metrics from StatsD, Redis, Elasticsearch, PostgreSQL, and more.
-
InfluxDB is a time-series database built from the ground up to handle high write and query loads.
-
Chronograf is a graphing and visualization application for performing ad hoc exploration of data.
-
Kapacitor is a data processing framework providing alerting, anomaly detection, and action frameworks.
PROTIP: Instead of Kapacitor, many use Grafana to display dashboards.
Another way of looking at this:
Charts (files that describe a set of Kubernetes resources) read by Helm.
Open Source
The https://www.influxdata.com/products/
platform is open sourced at
https://github.com/influxdata/influxdb
https://github.com/node-influx/node-influx is the source for the Node Client at
https://node-influx.github.io
https://github.com/influxdata/influxdb-python
https://github.com/ziyasal/InfluxDB.Net
https://github.com/influxdata/influxdb-java
Cloud
Where Influx makes money is on their Cloud platform: 14 days free, then https://cloud.influxdata.com/plan-picker on AWS EC2 us-west-2 region
- $0.35 per hour
- $2.08 per hour
Sandbox
Get hands-on:
-
https://github.com/influxdata/TICK-docker/blob/master/README.md
-
https://github.com/influxdata/sandbox
Install Influx CLI
https://github.com/Dieterbe/influx-cli
https://github.com/influxdata/influxdb-ruby
Create Docker
https://www.youtube.com/watch?v=PfsNHGy9EsE
-
Create Docker volume:
sudo docker volume create influxdb_data
The response is the name of the volume created.
-
Run Docker container:
docker run -d --name="influxdb" -p 8086:8086 -v influx_data:/var/lib/influxdb \ influxdb -config /etc/influxdb/ifluxdb.conf \ --restart on-failure
This will pull from library/influxdb
https://www.portainer.io/
-
Create database:
docker exec -it influxdb /bin/bash
This returns a prompt for the Docker container instance localhost:8086
-
Create a database:
CREATE DATABASE home_assistant exit
-
Add component to HA by editing the config using the vi editor:
vi configuration.yaml
-
Insert
influxdb: host: 10.10.10.20 include: domains: - sensor
Dashboard
https://github.com/CymaticLabs/InfluxDBStudio
https://github.com/anryko/grafana-influx-dashboard
Ingest
Line Protocol
Influx DB does not use JSON, but a “line protocol” of its own design.
Numbers are assumed to be floating point by default.
Integer values must have an “i” (such as value=12i).
Since the Influx database is built for time series data,
Date values do not have a label.
DateTime stamps are 19 digits consisting of the number of seconds since the 1970 Epoch,
plus 11 more numbers for a precise nanosecond, as in “1445299200000000000”.
QUESTION: Sample code and libraries for clients communicating with the database.
Data Import
Tick script:
curl https://s3-us-west-2.amazonaws.com/influx-sample-data/NOAA.txt > NOAA_data.txt influx -import -path=NOAA_data.txt -precision=s influx USE NOAA_water_database precision rfc3339 SHOW SERIES SHOW FIELD KEYS # data types
Query
The InfluxDB CLI works with a SQL-like language.
To start
influx CREATE DATABASE mydb SHOW DATBASES USE mydb SELECT percentile(busy,90) FROM cpu WHERE time > now() - 1h SELECT MEAN(busy) FROM ... GROUP BY location SELECT * FROM h2o_quality GROUP by time(10m) ORDER BY time DESC LIMIT 4 WHERE time > now() - 1h
Kapacitor recognizes anomalies
https://github.com/poxet/Influx-Capacitor
// detect system down var period = 2m var every = 30s // select the stream var sys_data = stream |from () .database('telegraf') .measurement('system') .where(lambda): "host" =~ /tot.*/ OR "host" =~ ~/prod.*/ .groupBy('host','cluster_id') |window() .period(period) .every(every)
from [1]
Benchmarks
- https://docs.influxdata.com/influxdb/v1.2/guides/hardware_sizing/
- https://github.com/influxdata/influxdb-comparisons
InfluxDB created their own Storage Engine to define sharding distributing read/writes on several disks, making sure. Indexing. Examples include InnoDB, MyISAM, Falcon, XtraDB (MySQL, etc.)
Influx claims “350,000 writes per second on commodity hardware”.
Benchmarking InfluxDB Storage Engines: v0.10, v0.9,and v0.8 by Todd Persen, VP of Engineering January 28, 2016
Time-Structured Merge (TSM) Tree
Built-in WAL with queryable in-memory cache
TODO: Compare against ELK stack and NoSQL databases.
Transformer functions
The Influx server is written the Go programming lanuage.
Integrates with TensorFlow
See video “How to Manage TensorFlow with InfluxData” (The TensorFlow Jupyter notebook for weather prediction is shown from 17:24)
Social
- https://www.influxdata.com
- #influxdb #influxdata on Twitter ???
- @influxdb on Twitter
- @influxdbNews on Twitter
- @hostedinflux on Twitter
- LinkedIn group of InfluxData Developers
- LinkedIn company profile
- YouTube channel has no videos. Related: Grafana
- LinkedIn group
- Meetup.com - none
- Google+
- Google Groups for developers
- 100% recommendation from 2 employees on Glassdoor
- Paysa
Meet People
Paul Dix, CTO of InfluxData (NYC resident)
-
Monitoring InfluxCloud with InfluxDB [28:06] at GrafanaCon 1 Dec 2016 (1)
-
Metric-driven development with Ansible, InfluxDB, and Grafana at Red Hat Summit
-
Introduction to InfluxDB [53:08] at Hakka Labs
-
InfluxDB Storage Engine Internals [43:42] at Hakka Labs
Nathaniel Cook:
- Watch Everything, Watch Anything: Anomaly Detection at Salt Lake City DevOps Days
GrafanaCon 2017
Open Source Summit North America 2017 Monday, September 11, 2017 - Thursday, September 14, 2017
- Time-Series Dashboards with Grafana and Influx DB September 2016 says he uses Grafana because he didn’t need text search. Grafana does not support Mongo so I had to bring in Influx DB to integrate with Grafana.
Addtitional information:
-
https://gist.github.com/travisjeffery/43f424fbd7ac677adbba304cef6eb58f
-
https://github.com/influxdata/influxdb-testing
Resources
https://www.spectory.com/blog/System%20monitoring%20with%20InfluxDB%20vs%20Elasticsearch
https://www.youtube.com/watch?v=PfsNHGy9EsE Installing InfluxDB in Docker