System Monitoring

He sees you when you’re sleeping. He knows when you’re awake …

Overview

Options
Metrics collection
Metrics
StatsD
StatsD Graphite
Dynatrace in AWS
More on cloud

This is a hands-on narrated tour on how metrics.

I want you to feel confident that you’ve mastered this skill. That’s why this takes a hands-on approach where you type in commands and we explain the responses and possible troubleshooting. This is a “deep dive” because all details are presented.

Like a good music DJ, I’ve carefully arranged the presentation of concepts into a sequence for easy learning, so you don’t have to spend as much time as me making sense of the flood of material around this subject.

Sentences that begin with PROTIP are a high point of this website to point out wisdom and advice from experience. NOTE point out observations that many miss. Search for them if you only want “TL;DR” (Too Long Didn’t Read) highlights.

Stuck? Contact me and I or one of my friends will help you.

Options

There are several

Install node.js
Define a project folder.
Clone the project
Create a config file based on exampleConfig.js
Edit the file for debugging:

debug - log exceptions and print out more diagnostic info

dumpMessages - print debug info on incoming messages
Use the default UDP server on localhost?
Start the StatsD daemon:
```
node stats.js /path/to/config
```

Send in metrics from your command line:


echo "foo:1|c" | nc -u -w0 127.0.0.1 8125

Execute tests
```
chmod +x
./run_tests.sh
```
IRC channel: #statsd on freenode
Mailing list: statsd@librelist.com

Metrics collection

Servers export metrics data to collection servers.

Data collected are coalesced into aggregate metrics visualized by graphing systems such as Graphite and Kibana.

Measures are concrete, usually measure one thing, and are quantitative in nature (e.g. I have five apples).

Metrics describe a quality and require a measurement baseline (I have five more apples than I did yesterday).

Metrics

A metric is a measurement composed of a name, a value, a type, and sometimes additional information describing how a metric should be interpreted.

The StatsD metric collection protocol originated in 2008 at Flickr</a> an Etsy’s statsd (daemon) by Erik Kastner has the form:

   <metric name>:<value>|<type>[|@<sample rate>]

The “type” character at the end includes:

“g” for guage, which provides instantaneous measurements such as the gas gauge in a car, calculated at the client (rather than the server).

“c” for counter, a guage calculated at the server. This differentiation is necessary because metrics sent by the client increment or decrement the value of the gauge rather than giving its current value. which provides instantaneous measurements such as the gas gauge in a car, Counters may also have an associated sample rate, given as a decimal of the number of samples per event count.

“m” for meter, which measures the rate of events over time, calculated at the server.

“t” for timer, which measures the number of milliseconds elapsed between a start and end time, such as time to complete rendering of a web page for a user.

“h” for histogram, which presents a distribution of timer values over time, calculated at the server.

See Counting-timing blog by Cal Henderson.

StatsD

Influenced by Coda Hale’s Metrics.

https://github.com/etsy/statsd/wiki

StatsD is a front-end proxy for the Graphite/Carbon metrics server written in Node, though there have been implementations in other languages since then.

based on ideas from Flickr and a post by Cal Henderson: Counting and Timing.

StatsD Graphite

http://graphite.wikidot.com/

http://graphite.readthedocs.io/en/latest/

Key Concepts

buckets Each stat is in its own "bucket". They are not predefined anywhere. Buckets can be named anything that will translate to Graphite (periods make folders, etc)

values Each stat will have a value. How it is interpreted depends on modifiers. In general values should be integer.

flush After the flush interval timeout (defined by config.flushInterval, default 10 seconds), stats are aggregated and sent to an upstream backend service.

Dynatrace in AWS

You can use Dynatrace in place of or in addition to Amazon CloudWatch logging. Here are the steps:

Download the installer from Dynatrace.com.

BLAH: I wish Dynatrace have its own on S3.

This can be either/both a Windows or Linux instance.
Put the Dynatrace installer in an S3 instance so that Ansible scripts to build up a server have a stable reference.
Create a new AWS instance.

Again, this can be either a Windows or Linux instance.
Install the Dynatrace agent on the server.
Connect the agents to the Dynatrace controller so you see metrics being recorded.
Impose some artificial load on the machine to see metrics in their full glory.
Repeat the above in an automated script:
1. Jankins invoked when a commit occurs to a branch on GitHub
2. The Jenkins v2 Pipeline Groovy script downloads build script from GitHub
3. The build downloads installers to assemble
4. The build script creates image in DockerHub
5. Instantiate AWS with Docker image
6. Sends an email when the image is ready for use
7. Start a performance testing run
8. Sends SMS texts with the results of test run
9. If all is well, commits into the next branch in GitHub

More on cloud

This is one of a series on cloud computing:

Wilson Mar