Wilson Mar bio photo

Wilson Mar

Hello. Hire me!

Email me Calendar Skype call 310 320-7878

LinkedIn Twitter Gitter Google+ Youtube

Github Stackoverflow Pinterest

Identify relationships between TPS and metrics


Overview

Here is how to analyze system performance automatically.

Objectives

Our objective is a way to create a structure and proactively identify issues before they occur.

??? and use a regression formula that correlates processing data and

Processing data is *HPS = Hits per second and TPS = Transactions per second

Architecture

We prefer to use all open source software components.

For example, the “TICK” stack:

tick_stack 650x311

  • Telegraf is a plugin-driven server agent for collecting and reporting metrics. It pulls into one place metrics from StatsD, Redis, Elasticsearch, PostgreSQL, and more.

  • InfluxDB is a time-series database built from the ground up to handle high write and query loads.

  • Chronograf is a graphing and visualization application for performing ad hoc exploration of data.

  • Kapacitor is a data processing framework providing alerting, anomaly detection, and action frameworks.

tick plus 800x684.png *

Charts (files that describe a set of Kubernetes resources) read by Helm.

https://github.com/influxdata/sandbox

Addtitional information:

  • https://docs.influxdata.com/influxdb/v1.2/guides/hardware_sizing/

  • Tutorial to run on Digital Ocean’s cloud using Centos7

  • https://github.com/influxdata/TICK-docker/blob/master/README.md

  • https://gist.github.com/travisjeffery/43f424fbd7ac677adbba304cef6eb58f

  • https://github.com/influxdata/influxdb-comparisons

  • https://github.com/influxdata/influxdb-testing

Where custom programming is needed, the preference is Python and Java.

Plan

### App under load

  1. Select a sample target application to analyze (EasyTravel Java/Spring app from Dynatrace has code)
  2. Ansible: Establish a test environment for the application
  3. Telegraf: Instrument application environment to collect mentrics time series data

    Server load Data

  4. JMeter: Create load scripts that impose a gradually increase in artifical load until overload
  5. JMeter: Conduct load-induced runs to collect processing and monitoring data along the same time series
  6. Custom: Extract run results into a format to load into database (https://github.com/influxdata/whisper-migrator)
  7. Custom: Load into database

  8. Custom: Identify period with peak rate of processing (transactions per second)
  9. Custom: Obtain metadata around peak periods (number of users imposing load)
  10. Custom: Match metrics observed during peak steady-state period

    Initial metrics

  11. InfluxDB: Create a database using the appropriate technology vendor
  12. Load initial metrics into database and index
  13. Verify backup and recovery procedures
  14. Chronograf: Produce initial (simple) visualizations

    Regression formula

  15. Calculate regression formula

    Predictive

    To identify conditions before the point of inflection to identify triggers.

  16. Chronograf: Explore by visualizing ratios (such as memory used per active user)
  17. Kapacitor: Setup alerts

    Kapacitor has its own DSL called TICKscript.

    Action

  18. Custom: Automatic response to alerts (such as bring additional servers up)