A high-level explanation of Why do it? And how
This is about the economics of performance testing and performance engineering work.
Performance testing is done to:
- ensure productivity of employees
- prevent losing sales from vistors abandoning interactions
- avoid damaged to the company’s reputation
A central “Center of Excellence” (CoE) for performance testing provides an injection of experience and skill into teams for quick start. This also saves money from coordinated use of tools.
The central team maintains off-shore skills to provide scripting expertise and other specialized service done efficiently.
Teams experienced with performance testing processes can reduce cycle time by communication of only exceptions.
For strategic program managers
ANALOGY: Spending money on performance testing is like companies paying for pre-employment drug tests. All employers inconvenience new hires with drug testing even though the vast majority of employees don’t use drugs because the consequences of that one bad apple potentially devastating trust among employees.
Just as there are several types of drugs potential employees can be using, there are different performance risks that may undermine enterprise systems.
Below is a summary of the work processes and vocabulary of this work.
Various types of performance testing address different concerns:
Is the environment used for testing capable of being used?
“Smoke tests” are run to ensure that services respond to requests and other aspects of servers under test.
Most organizations make use of “server virtualization” programs that mimic working systems to remove a dependency.
Where can programming “hang” the system?
For the same reason as 100% of candidates for employment are required to take drug tests, the more coverage of user functionality in performance tests, the more likely we are to find hidden issues.
“Performance tests” are done to identify how quickly servers respond to requests from customers or employees. The measurement is called
response time, expressed in seconds or thousands of a second (milliseconds).
Many performance defects do not degrade response time until there are many users using the systema at the same time.
The “Shift-left” movement to catch performance problems earlier involves watching out for slow performance while developers create the system, rather than after development has finished.
“network latency tests” are conducted to determine whether programs being built adequately handle delays in network latency (transmission time).
As more and more processing moves to software on devices (operated by end-users) rather than on servers, many organizations now measure how much time client programs take to “render” data obtained from servers.
Measurement of “Key-to-glass” time is done by programs that automate how real people touch a screen, type on keyboards, or click on mice.
Traditionally, programs in emulate the communication between client and server by capturing the communication as the basis for creating an automated program that mimics the load caused by hundreds or thousands of real users.
Advanced editions of such programs release a swarm of artificial users at the same time at a specific point in the workflow to see if there are “concurrency” defects.
What is the capacity of the system?
What is the highest rate the system processes business transactions?
Some servers work until a certain level of work burns them out. It is always helpful to have a statement of the likely peak number of users that might be expected to use the system at peak times.
Peak demand in a system is defined practically by the number of users who are actively using the system at the same time (concurrently).
“Load tests” are conducted to identify the rate business transactions can be processed without failure at various levels of expected load (demand from a set number of users).
“Stress tests” reveals how much servers can handle before they run out of memory, processing power, disk space, or other resource needed.
Stress testing is normally used to understand the upper limits of capacity within the system. This kind of test is done to determine the system’s robustness in terms of extreme load and helps application administrators to determine if the system will perform sufficiently if the current load goes well above the expected maximum.
As a system becomes overloaded, users experience slower response times while they “stand in line” waiting for others to be processed. So average response times and response times at the 95th percentile are defined to set maximum allowable service levels.
“Fail over tests” are sometimes an aspect of stress tests to observe what events lead up to and following the point when a server fails due to load.
How long can the system run continuously?
Sometimes programs “leak” memory and require more and more memory until it runs out.
“Soak tests” run the system over several hours or days to determine if a system is exposed to that risk. These are also called “longevity tests” or “endurance tests” done to determine if the system can sustain the continuous expected load. During soak tests, memory utilization is monitored to detect potential leaks. Also important, but often overlooked is performance degradation, i.e. to ensure that the throughput and/or response times after some long period of sustained activity are as good as or better than at the beginning of the test. It essentially involves applying a significant load to a system for an extended, significant period of time. *
The goal is to discover how the system behaves under sustained use.
Can the system handle momentary sudden jumps in demand?
“Spike tests” are included within soak tests to see whether the system has enough “buffer” capacity to process sudden abnormal spikes in requests.
Spike testing is done by suddenly increasing or decreasing the load generated by a very large number of users, and observing the behaviour of the system.
Such tests are among a set of tests which reveal conditions that can cause performance to suffer or make the system fail.
Performance Engineering works to fix the underlying coding or configurations to ensure resiliency.
How many additional servers should be added (or removed) to meet anticipated demand?
“Scalability tests” add more servers and/or larger servers, or remove them.
Usually, doubling the amount of hardware does not double the rate of processing because of inefficiencies added, such as additional communications necessary. Planning and testing reveal limits in supporting infrastructure such as the capacity in networking equipment.
Are configuration settings set at optimal levels?
“Configuration tests” determine the effects of configuration changes on the system’s performance and behaviour.
These are also called “tuning” runs.
The objective is savings from less hardware being required to handle the same capacity.
Indiviual responses from the database, application server, etc. are monitored during test runs in order to identify bottlenecks in the application software and the hardware the software is installed on.
Will growth in amount of data over time affect performance?
“Data volumn testing” determines whether the performance of the application is affected by the amount (volume) of data being handled by the application. The structure of data may not provide adequate performance as databases grow over time with historical information.
Performing such testing requires a huge volume of data needs to be artifically created in the database. The enormity of such a task is beyond most organizations.
For project leads
Due to variations in momentary network speeds and other aspects that might vary during test runs, test runs are often repeated before a baseline is selected as the basis for comparison against alternative runs.
Performance test plans answer these questions:*
What is the performance test scope? What subsystems, interfaces, components, etc. are in and out of scope for the test?
How many concurrent users are expected for each user interfaces (UIs) involved?
Specify peak vs. nominal.
Whether a system performs well under concurrent usage is “stability” attribute, one of several “Non-functional requirements” (NFRs), with the “functional” referring to functionality operated by end-users (not whether servers are functioning).
What are the time requirements for any/all back-end batch processes?
Specify peak vs. nominal.
What is the target hardware under test?
Specify all server and network appliance configurations.
What is the Application Workload Mix by each system component?
For example: Workload A consists of 20% log-in, 40% search, 30% item select, 10% checkout, etc..
Specifications also include the “think time” between user actions (to reflect the time users take to look at reference material, etc.) and the “pacing” of time between transactions of each user.
What is the Workload Mix? [Multiple workloads may be simulated in a single performance test
For example: 30% Workload A, 20% Workload B, 50% Workload C.
What metrics is measured before and during runs?
In addition to “standard” one such as Memory, CPU utilitization, Disk space usage, etc. there are others such as VMware MHz used.