Identify relationships between TPS and metrics
Here is how to analyze system performance automatically.
Our objective is a way to create a structure and proactively identify issues before they occur.
??? and use a regression formula that correlates processing data and
Processing data is *HPS = Hits per second and TPS = Transactions per second
We prefer to use all open source software components.
For example, the “TICK” stack:
Telegraf is a plugin-driven server agent for collecting and reporting metrics. It pulls into one place metrics from StatsD, Redis, Elasticsearch, PostgreSQL, and more.
InfluxDB is a time-series database built from the ground up to handle high write and query loads.
Chronograf is a graphing and visualization application for performing ad hoc exploration of data.
Kapacitor is a data processing framework providing alerting, anomaly detection, and action frameworks.
Charts (files that describe a set of Kubernetes resources) read by Helm.
Where custom programming is needed, the preference is Python and Java.
### App under load
- Select a sample target application to analyze (EasyTravel Java/Spring app from Dynatrace has code)
- Ansible: Establish a test environment for the application
Telegraf: Instrument application environment to collect mentrics time series data
Server load Data
- JMeter: Create load scripts that impose a gradually increase in artifical load until overload
- JMeter: Conduct load-induced runs to collect processing and monitoring data along the same time series
- Custom: Extract run results into a format to load into database (https://github.com/influxdata/whisper-migrator)
Custom: Load into database
- Custom: Identify period with peak rate of processing (transactions per second)
- Custom: Obtain metadata around peak periods (number of users imposing load)
Custom: Match metrics observed during peak steady-state period
- InfluxDB: Create a database using the appropriate technology vendor
- Load initial metrics into database and index
- Verify backup and recovery procedures
Chronograf: Produce initial (simple) visualizations
Calculate regression formula
To identify conditions before the point of inflection to identify triggers.
- Chronograf: Explore by visualizing ratios (such as memory used per active user)
Kapacitor: Setup alerts
Kapacitor has its own DSL called TICKscript.
- Custom: Automatic response to alerts (such as bring additional servers up)