Wilson Mar bio photo

Wilson Mar

Hello. Hire me!

Email me Calendar Skype call 310 320-7878

LinkedIn Twitter Gitter Google+ Youtube

Github Stackoverflow Pinterest

Analyze and visualize across multiple dimensions, automatically


Overview

This repository is a set of software to enable processing, alerting, and visualization of data collected from various sources.

This effort is necessary for data related especially to load/stress testing because current solutions provide visualization of only individuals metrics for individual runs, without understanding additional context.

The context needed span across many dimensions to identify:

  • Creeping increases in response times for the same metric over several versions of the application.

viz-pt-creeping-650x285-34125

  • Sudden change in memory per user (or other ratios) within a specific enviornment, when compared against the same metric across environments from dev to production.

viz-pt-env-diff-650x270-79779

  • Differences integrating data gathered from difference sources (CA-APM GC metrics, LoadRunner Transactions Per Second).

  • Differences between similar data gathered from multiple tools (from multiple vendors), such as LoadRunner, Gatling, JMeter, Selenium, etc. so that they can be evaluated for consistency.

More importantly, a different approach is needed than what current vendors provide in terms of alert triggers: Instead of requiring people to diligently pour through all the various graphs across runs in various enviornments for every release, this effort aims for people to receive alerts after intelligent software identify trends and anomalies after assimilating data from various runs and sources, around the clock.

We want a way to use what I call “peak summary metrics”, such as the amount of Memory or average response time during the “steady state period” between ramp-up and ramp-down:

lr-response-time-sustained-500x208-57305

In the above example, when the Hits Per Second averaged 110 for the transaction at the top line (in purple), other metrics were, during the same “steady state period” period:

  • Memory averaged 3.2 GB
  • CPU averaged 65%
  • etc.

What is different about these numbers is that each metric summarizes conditions over the entire run (not a single point in time).

Repeated over several runs, sets of summary numbers enables trends and anomalies to be identified in the association between various metrics (such as between memory used and Hits per Second, etc.).

When runs are repeated several times, several sets of metrics can be plotted together to yield scatterplots such as this:

analysis-memtotal-650x357-44138

The trend line calculated from various runs provide a formula of what to expect in the relationship between the two metrics. Several other formulas (relationships) can be calculated from various combinations of measurements captured during runs.

The formula from the dotted line derived can be used to proactively identify anomalies. During live runs the measurement system can refer to each formula to determine whether the system is functioning normally at point “p” (illustrated above), or is using more memory than expected at point “q”.

Ratios

Now instead of just total Memory used, we calculate a ratio of peak Memory per user for each run. We then plot it against peak hits per second for the same run.

analysis-memperuser-650x344-50875

We get a flat line because the metric factors in the cause of memory usage.

That’s until we reach the maximum hits per second that can be handled, when swapping behavior occurs, causing more memory to be used.

Financial Analysts have long calculated various ratios to identify relationships among all companies, and then defining thresholds for alerts when applied to specific companies.

A ratio can be more useful that a static number because it involves a relationship between two metrics. In the case of Memory per user, history often shows that it is relatively stable for a range of HPS (Hits Per Second) levels until the HPS goes beyond a threshold level.

If in production Memory per User spikes beyond the normal level, we know there is a problem.

Hopefully the real-time monitor (in NGINX, etc.) would understand the context of loads and automatically limit sending more than what each server can handle.

Data Flow

Here is how data is processed:

  1. A performance test is run to emulate load on the application system of servers and networks.
  2. Monitoring software capture JVM dynamics such as Garbage Collection, memory and CPU used, etc.
  3. The run generatres a HTML file to summarize statistics. This is usually kept for historical review and comparison.
  4. A batch program parses HTML pages and writes the numbers from it into a database.
  5. A program analyses the accumulated data and identifies trends and anomalies (described below).
  6. Trends and anomalies in the most recent run beyond a threshold cause alerts to be issued.
  7. Visualizations of the trends and anomalies are generated along with alerts about thresholds being breached.

Machine Learning

The innovation here is that:

1) the system accumulates its data from past runs (by reading HTML files summarizing prior runs)

2) the system tries various mechanisms to detect trends and anomalies by generating formulas and seeing how historical data fits. Different ratios are examined. For example, instead of Memory vs. HPS it can be CPU vs. HPS.

3) The extent of “fit”

Implementation

This requires skill with different technologies and thus cross-department coordination:

  • Performance testing
  • Extracts from utilities monitoring of JVM for GC and app statistics
  • Python to parse historical docs
  • Web Server (provisioning, logging, metrics, hardening, etc.)
  • Back-end Database (Big Data) schema design
  • Statistics (Data Science)
  • Visualization
  • Deep Learning to recognize

The bottom line here is for the performance testing effort to come up with a formula to hand production operations

The trouble with the current situation is that the APM software looks emphirically at what is happening but not make use of what has been discovered during performance testing.

Python, Go, Java?

I chose Python To parse HTML, etc. on the Goolge cloud because of its popularity for Machine Learning.

http://docs.python-guide.org/en/latest/scenarios/scrape/ uses the Requests module instead of urllib2 built-in.

https://medium.freecodecamp.org/how-to-scrape-websites-with-python-and-beautifulsoup-5946935d93fe

https://www.sqlalchemy.org/ for SQL

DSL?

Domain-Specific Language like Rapid7.

JetBrains MPS makes it easier to work with DSLs and their projection editor of AST trees. This enables code to be edited as text, tables, diagrams, a form, or a combination of those.

First Date, Second Date and Getting started by tomasetti.me

1) Projects 3) Constraints