Distribute load to scale and upgrade servers with no down time. Also route transactions across clouds using ZeroLB
- Why - Who Needs Load Balancing?
- The Trouble with Load Balancing
- Global Load Balancing in Service Mesh
- Load Balancing in Azure
- Load Balancing in AWS
- Algorithms for routing
- Load Balancing in the Cloud
The contribution of this article is a deep yet succinct hands-on step-by-step tutorial which is logically sequenced, with “PROTIP” marking hard-won, little-know but significant facts available no where else on the internet.
NOTE: Content here are my personal opinions, and not intended to represent any employer (past or present). “PROTIP:” here highlight information I haven’t seen elsewhere on the internet because it is hard-won, little-know but significant facts based on my personal research and experience.
- Why Load Balance?
- The Trouble with Load Balancing
- Load Balancing in Azure
- Load Balancing in AWS
- Network LB
- More Resources
Why - Who Needs Load Balancing?
Server load balancing scales “horizontally” to accomodate higher volume work by distributing (balancing) client requests among several servers. This allows more servers (hosts) to be added to handle additional load.
Even if a single server can handle current load, it is still a good idea to design and test your system to use a load balancer to distribute load so that the cluster stays up.
Having your applications on more than one machine allows operations personnel to work on one machine while the other is busy working.
Having “fault resilience” means that an alternative server can take over when, inevitably, a server crashes or just needs maintenance work done.
This is called ensuring uninterrupted continuous avialability of mission-critical applications.
It’s also a good idea to test your system for its ability to handle growth when it’s still fresh on developer’s minds, since load balancing and fail-over issues can be quite complex.
The oldest load balancers operated at network level 4 (L4), which route based on IP address.
Newer “application” load balancers, operating at network level 5 (L7), allocate traffic based on the URL, such as routing to web vs. API servers.
The Trouble with Load Balancing
Load balancers may not evenly distribute load among active machines in the cluster. This can happen for several reasons:
Session stickiness. Once the IP of a client gets assigned to a particular server (at Login), it stays with that server until the session ends. This is the case with SAP SAPGUI R/3 application performance tuning
Session stickiness is inherently fragile because the backend hosting the session may die.
Also, since each session poses different levels of load, load balacers which allocate based on frequency of allocation (such as round-robin algorithm) may over-allocate sessions to a server which ended up with users and sessions who consume a higher-than-average amount of resources.
Algorithms for allocation/splitting of traffic include:
- round robin
- least request
- ring hash
PROTIP: Monitor in real-time the load on each individual resource in a pool. Also measure the extra time load balancers take.
Use of a load balancer takes an extra hop in the network, which adds latency.
Global Load Balancing in Service Mesh
Kong Mesh claims a reduction in “network latency by 2x on request lifecycles and 4x on full roundtrips” plus “zone aware load balancing to reduce egress costs” across multi-cloud Kubernetes clusters, VMs, and bare metal.
Such progress can be made with new “decentralized” applications using the Service Mesh> architecture pattern working within Kubernetes, which deploys (bootstraps) intelligent proxies as “sidecars” running in each pod alongside each app container.
Decentralized microservices apps Dockerized for running in containers make all network communication through its sidecar proxy (like handing a box to UPS to deliver).
Each sidecar proxy can communicate with backends in different zones (generically named A, B, and C in the diagram). Policies sent to each sidecar can specify zones and a different amount of traffic be sent to each zone. Each zone can be in different clouds (thus multi-cloud).
This approach also enables security-related policies to be applied (such as limiting outflows from some countries) and detection of zonal failure and automatic rerouting around traffic anomalies (including DDoS attacks).
Each sidecar proxy and backend service report periodic state to the global load balancer (GLB) so it can make decisions that take into account latency, cost, load, current failures, etc. This enables centralized visualizations engineers use to understand and operate the entire distributed system in context.
This means app developers no longer need to bother coding for a long list of operational cross-cutting concerns:
collection and reporting of telemetry (health checks, logs, metrics, traces)
- TLS termination (SSH key handling)
Handle protocols HTTP/2, WebSocket, gRPC, Redis, as well as TCP traffic
- rate limiting (DoS mitigation)
- timeout and back-out handling when response is not received
- Fault injection (for chaos engineering to improve reliability)
Enforce policy decisions
- load balancing
- Staged rollouts with percentage-based traffic splits
Embedding the above functionality in each app program may provide the best performance and scalability, but requires polyglot coding to implement the library in many languages. It can also be cumbersome to coordinate upgrades of new versions of each library across all services.
There are several sidecar programs:
Logically, communication of packets/requests travel through a “data plane”.
There is also a “control plane” which, rather than exchanging packets/requests, traffic in policies and configuration settings to enable services such as:
- deploy control (blue/green and/or traffic shifting),
- authentication and authorization settings,
- route table specification (e.g., when service A requests /foo what happens), and
- load balancer settings (e.g., timeouts, retries, circuit breakers, etc.).
Several products provide a “control plane UI” (web portal/CLI) to set global system configuration settings and policies as well as
- Dynamic service discovery
- certificate management (acts as a Certificate Authority (CA) and generates certificates to allow secure mTLS communication in the data plane).
- automatic self-healing and zone failover (to maximize uptime)
Several control plane vendors compete on features, configurability, extensibility, and usability:
open-sourced Nelson uses Envoy as its proxy and builds a robust service mesh control plane around the HashiCorp stack (i.e. Nomad, etc.).
Kong Mesh (part of the Konnect Connectivity Platform) makes use of Envoy
SmartStack creates a control plane using HAProxy or NGINX.
Load Balancing in Azure
If an availability zone goes down within the Azure cloud, Microsoft takes care of switchover transparently to users.
Load Balancing in AWS
Amazon requires that each user sets up their own Load Balancer distributing traffic to running nodes in several AZs in case an one availability zone goes down.
To distribute traffic among several regions, setup a DNS (using Amazon Route 53). Public Load Balancers need a public DNS entry (Route53).
Load Balancing is covered in the $300 AWS Certified Advanced Networking - Speciality certification exam (65 questions in 175 minutes). Video courses:
- Pluralsight: ALB
- Pluralsight: Implementing AWS Load Balancing by Mike Pfeiffer 1 hour
- VIDEO: https://aws.amazon.com/elasticloadbalancing
Internal load balancers use only private IPs within AWS.
Elastic Load Balancing (ELB) is a DNS “A” record pointing at 1+ nodes in each AZ elaticized by a ASG (Application Scaling Group).
ELB consists of four types of load balancers that all feature the high availability, automatic scaling, and robust security necessary to make your applications fault tolerant.
AWS has three types of Load Balancers:
- Version 1 introduced 2009 is the Classic Load Balancer. One SSL cert per server.
- Version 2 are faster and support target groups and rules
- ALB Application LB handles HTTP/WebSocket async
- NLB Network LB handles TCP, TLS, UDP
|ECS LB Type:||Classic LB||App LB||Network LB||Gateway|
|Layer:||4 & 7||Layer 7||Layer 4||4 & 7|
|Protocols:||HTTP/S TCP||HTTP/S/2||TCP TLS UDP||HTTP/S|
|$ Per GB:||$0.025||$0.0225||$0.0225||$0.0125|
|$/Capacity Unit hour:||$0.008 /LCU||$0.008 /ALCU||$0.006 /NLCU||$0.004 /GLCU|
Classic Load Balancer provides basic load balancing across multiple Amazon EC2 instances and operates at both the request level and the connection level. Classic Load Balancer is intended for applications that were built within a EC2-Classic network. Charges are in Load Balancer Capacity Units (LCU) used per hour plus $0.008 per GB of data processed (the only type to charge by network). More expensive to wean users off legacy. Each port handled by a Classic Load Balancer needs a separate load balancer setup:
Application Load Balancer distributes HTTP and HTTPS traffic and provides advanced request routing targeted at microservices and containers. Application Load Balancer routes traffic to targets within Amazon VPC based on the content of the request. Charges are in Charges are in Application Load Balancer Capacity Units (ALCU) used per hour.
Only ALB handles:
- HTTP host header
- HTTP method
- Source IP address or CIDR
- Arbitrary HTTP header
- HTTP query string parameter
- Combined rules
- AWS WAF
- Weighted Target Group routing for A/B testing, migrations, gradual Blue/Green deployments
- Network Load Balancer distributes Transmission Control Protocol (TCP), User Datagram Protocol (UDP), and Transport Layer Security (TLS) traffic where extreme performance is required. Network Load Balancer routes traffic to targets within Amazon VPC and is capable of handling millions of requests per second while maintaining ultra-low latencies. Charges are in Network Load Balancer Capacity Units (NLCU) per hour.
- Gateway Load Balancer makes it easy to deploy, scale, and run fleets of third-party virtual networking appliances (such as F5) for security, network analytics, etc. Gateway Load Balancer is transparent to the source and destination of traffic. Charges are in Gateway Load Balancer Capacity Units (GLCU) per hour. Gateway Load Balancer uses Gateway Load Balancer Endpoint (GWLBE), a new type of VPC Endpoint powered by AWS PrivateLink technology that simplifies how applications can securely exchange traffic with GWLB across VPC boundaries. GWLBE is priced and billed separately.
REMEMBER: A /27 subnet allows scaling to 8 free IPs for use by up to 8 nodes in that subnet. (/28 is enough actually)
AWS WAF up front for DDoS protection.
Define separate subnets. Scheme: internet-facing
Scheme: internal: In Security group of targets in a private subnet, only allow inbound traffic only from the load balancer (in a specific security group ID).
AWS Distribution Rules
VIDEO: One rule for each app group, defined by up to 100 rules:
To avoid impacting app servers used by humans, route automated spiders to app servers not used by humans.
There is a 1 MB limit on Lambda request and response bodies. Lambda functions and ALB must be in the same AWS account.
Workload Metric: ConsumedLCU is based on the highest among metrics during each given hour:
- RequestCount (new requests per second)
- Active connections
- ProcessedBytes (payload GB per second) bandwidth
Target metrics: connection errors and Request time (latency)
AWS Health checks
External (Route 53) checks health of each ELB IP from the world. Failing IP health checks are removed from DNS.
Target health checks before sending requests.
AWS Routing Algorithms
Classic LBs route HTTP/S based on “Least outstanding requests” and TCP/TLS(SSL) using “round robin” algorithms.
NLBs routes based on “flow hashing”.
Name each LB running (as EC2 instances) with prefix “LB-“:
No SSH and Security Key Pair (idempotent).
In Advanced, specify an Apache shell script to run on start to create a folder for what it processes, then show a message:
#!/bin/bash yum install httpd -y systemctl enable httpd mkdir /var/www/html/orders echo "<h1>Orders LB here</h1>" > /var/www/html/orders/index.html systemctl start httpd
PROTIP: For “Zero Trust” strategy, do not have LB handle HTTPS termination internally - use HTTPS throughout. Redirect HTTP to HTTPS.
AWS “Elastic Load Balancing” distributes incoming application traffic across multiple registered targets such as Amazon EC2 instances, containers, IP addresses, Lambda functions, and virtual appliances.
LBs can handle application traffic in a single or multiple Availability Zones.
SNI (Server Name Indication) can send different certs to different clients, with automatic cert renewals by ACM (Amazon Cert. Manager).
Apps need to add back-off on failure (up to a point of abandonment).
Algorithms for routing
There are several basic types of load balancing:
Round robin DNS (RRDNS) involves client DNS servers to resolve a URL to multiple ip addresses (and thus multiple machines).
The problem with “round robin” is that it blindly hits each server regardless of its ability to accept work. This scheme requires every server to be homogeneous (nearly identical).
RRDNS is not recommended because it places load balancing outside the organization to authoritative DNS servers which caches DNS A entries. Changes to those entries can take a long time to propagate throughout the internet.
Network-based load balancers offer more sophisticated allocation algorithms. Network Load Balancing operates at the NIC driver level to detect the failure of a server and reassigns client computer traffic among the remaining servers.
Microsoft’s NLB service can take up to eight seconds to redirect load. How To Set Up TCP/IP for Network Load Balancing in Windows Server 2003 notes that
TCP/IP must be the only network protocol present on the cluster adapter. You must not add any other protocols (for example, Internetwork Packet Exchange [IPX]) to this adapter.
Because traffic enters the network, network load balancing cannot withstand moderate Dos (Denial of Service) attacks as well as large-capacity front-end network appliances which detect ad filter out malicious traffic before it gets to the server.
Dispatcher or proxy / switch based load balancers resolves requests from clients directed to a single virtual IP address (VIP) . The load balancer then uses a load balancing algorithm to distribute work to real IP addresses.
Such “server-based” load balancing is done either by an application running on a regular server or an appliance.
* Using regular PC machines (on standard operating systems such as Windows or UNIX)
has the advantage that it can be replaced (after configuration) with another machine
that is familiar to the staff.
There is less hassle from dealing with another vendor.
Vendors of such an approach include Resonate, Rainfinity, and Stonebeat.
* Using dedicated appliance hardware
has the advantage of speed, since they use
specialized Application Specific Integrated Circuit (ASIC) chips.
They "have internal-bandwidth backbones capable of handling a Gbps worth of traffic."[2,p31]
Vendors of such an approach include F5, and Radware.
"Firewall load balancers tend to max out at around 70 to 80 Mbps"[2,p9,60]
BigIP from f5 also compresses (for HTTP 1.1),
encrypts SSL network traffic, and centralizes the SSL key store onto a single machine.
However, an individual central dispatcher can become a single point of failure. So:
A hot spare is a machine that’s configured as a mirror image of the machine it replaces if that fails. It’s also called a passive node if it sits unused until it’s need to support a failover.
Both the active and standby load balancers occassionally send out heartbeat messages such as VRRP (Virtual Router Redundancy Protocol, RFC 2338) format UDP packets to multicast IP 22.214.171.124 port 1985. (in the range from 126.96.36.199 to 188.8.131.52).
Cisco has its HSRP (Hot Standby Routing Protocol) and Extreme Networks has its ESRP (Extreme Standby Router Protocol).
This communication can occur over a serial cable between two machines.
Multicast can span several subnets, but they must reach their final destination after a network latency of no more than 200 to 300 milliseconds.
The passive node takes over (assumes “master” status) when it does not hear the “I’m alive” heartbeat from the other machine.
Because failover can cause a VIP to be the destination address at different devices at different times, a VIP can be called “floating”.
Failback occurs when the failed server comes back online and takes load back from the failover node. This happens after a transfer of client state information.
To avoid “bridging loops” when each load balancer thinks the other is inactive, the ISO Level 2 Spanning-Tree Protocol (STP) is used to set a cost for each port. This provides a way to block multiple paths by opening only the lowest-cost port (the highest-priority port). However, “STP is almost never used since it can take 10 seconds or more to react. Typically, a proprietary variation of a hot-standby protocol is used.”[2,p59]
CAUTION: Proxies forward both client requests to servers and server responses to clients, so unless they can handle as much throughput as all the machines in its subnet, it can easily become a throughput bottleneck.
Note: Although not necessary for production work, “Individual Pass-Though VIPs” are requested/defined so that for performance measurement and troubleshooting, individual servers can be reached through the load balancer.
There are several distinguishing features among load balancers:
“Flat-based” IP typology means the VIP and real IPs are on the same subnet (usually behind a firewall). FTP and streaming applications use this typology along with Return Path DSR.
Many load balancers also have firewall capabilities (packet filtering, stateful inspection, intrusion detection, etc.)
More advanced load balancers enable the VIP to be on a different (more private/secure) subnet than IPs on servers receiving traffic. In this case, the load balancer acts as a gateway between the two LANs.
But this only allows one pair of load balancer machines.
Load allocation algorithms within daemons (or windows services), which dynamically:
1) listens for a “heartbeat”; or
2) periodically sends ICMP ECHO_REQUEST to measure RTT (Round Trip Time) “pings”; or
3) detect the status of bandwidth usage or CPU utilization on each server; or
4) counts the number of connections active on each server. </ul>
BEA Loogic 8.1 is server affinity that reduces the number of RMI sockets.
SAP R/3’s login balancing (SMLG) clustering technology makes the decision of which server receives a user for login based on the number of users currently logged in and response time, (not how many work processes are used at any moment).
Once a SAP user is logged on a particular app server, the user session is pinned to that server. Background jobs are also assigned to a server, but a new job might get assigned to another server.
failover automatic repartioning of client traffic among the remaining servers in the cluster so that users experience continuous performance Quality of Service (QoS).
Within the BEA WebLogic 8.1 HttpClusterServlet Plug-in, sessions are attached to a particular server by the client’s IP address. This means that a server must stay up until all sessions finish (which can take many minutes).
A failover mechanism must have an awareness of the progress of all activities so that client sessions can continue to complete if processing stops.
IP Multicasts are sent by each WebLogic server to broadcast their status. Each Weblogic server listens for the one-to-many messages to update its own JNDI tree.
Cisco’s LoadDirector uses a “Cookie Sticky” feature that redirects traffic to the same physical machine by examining cookies.
“Virtualization” software (such as EMC VMware) automatically installs application software to meet demand.
- Global load balancing that routes requests among several data centers over the WAN (Wide Area Network). “With cross-country latency at around 60 ms or more” [2,p10]
- Intelligent response to denial of service flooding attacks.
The least sophisticated load balancers use “Route-path”, where the load balancer acts as a Layer 3 router. But this allows 3 or more load balancers.
More sophisticated load balancers use “Bridge-path”, where the load balancer acts as a Layer 2 bridge between two LANs. However, this only works with flat-based nets.
Most sophisticated of all are when real servers (especially streaming and FTP servers) can bypass going back to the Load Balancer and send responses directly to clients by using Direct Server Return (DSR). This is desirable because “web traffic has a ratio of about 1:8, which is one packet out for every eight packets in.”[2,p26]
F5 calls this “nPath”. Foundry calls it “SwitchBack”.
DSR works by configuring the IP alias on the server’s loopback interface “lo” with the IP address of the VIP. This is done using the ifconfig command on Unix.
The server needs to still bind to the real IP address as well so the load balancer can perform health checks on the sever.
The default route path of the server needs to point to the router rather than the load balancer.
MAC Address Translation (MAT) ???
Load Balancing in the Cloud
The internal processes (life-cycle) of a "redundant, self-curing, and self-scaling" computing facility are
- Authorizing those who need to perform various activities with rights for their role (who can do what when)
- Imaging os and software apps into server snapshot images templates used to create working instances
- Provisioning (condensing) instances (with unique URLS and OS level settings)
- Configuring each instance (with unique application-level settings) [Puppet]
- Persisting data in and out each instance (with our without encryption)
- Storing logs eminating from instances into a centralized location
- Monitoring health and other metrics by instance type
- Deciding when alerts are appropriate, when an instance should be registered or un-registered with its load balancer, when instances should be added or removed
- Switching among instances from a static IP address [Sclar]
- De-Provisioning (evaporating) instances
- Analyzing logs for trends (cost per transaction over time) and implications to adjust number and size of instances most appropriate for each instance type/purpose and time frames) [capacity management like vkernel]
Sclar (a $50/month open-source offering from
Intridea, a Washington DC Ruby on Rails development firm) illustrates its approach for a polling controller this way:
## MS NLB Services
After Microsoft acquired, in 1998, from Valence Research the "Convoy Cluster"
component of Windows NT 4.0 Enterprise Edition,
Microsoft introduced its Network Load Balancing (NLB) service
as a free add-on service to Advanced and Datacenter versions of Windows 2000 Server
(known as Enterprise Edition of Windows 2003).
The acronymn for Windows Load Balancing Service (WLBS) is also the name
of the utility which verifies whether load balanced hosts "converge":
- wlbs stop
- wlbs start