I tune software systems for optimal performance and scalability (capacity).
Below is a sample approach based roughly on Agile DevOps principles:
Create small useable increments. Bring it through to production, each with increasing capability, thus revealing the technical components and human processes which cause bottlenecks.
1. Create production-like environments temporarily for testing.
This means server creation scripts that automatically bring up servers in a cloud quickly in a repeatable way.
The difference between production and test should be just a few configuration settings.
2. Measure all environments.
Install log management and metrics on servers, networks, load balancers, etc.
Do the same in test environments to make sure automatic triggers work.
3. Measure enviornment variability.
Are there spikes when the server is serving only landing pages?
How can you tell when spikes and other anomalies occurred?
How can you tell what are the causes of anomalies?
How can you predict when anomalies will occur if you don’t have history to analyze?
4. Impose an artificial load to test limits.
This identifies hidden issues throughout the system stack and human processes.
This begins with micro-benchmarks during development.
5. Do experiments to find the optimal.
This identifies the least-cost solutions (effort, time, and equipment).
For more detail see my list at Finding fault in JVM.