Wilson Mar bio photo

Wilson Mar

Hello. Hire me!

Email me Calendar Skype call 310 320-7878

LinkedIn Twitter Gitter Google+ Instagram Youtube

Github Stackoverflow Pinterest

Considerations for performance, high availability, latency, at lowest cost


Overview

This is a “deep-dive documentary”, about how to ensure performance, scalability, availability, resilience, and affordability from building, testing, and running computer software applications running in production on various cloud environments.

The attempt here is a logical sequence to cover tactics and strategies for several alternative architectures:

TODO: Split up this long page into separate pages, and re-published on Medium as separate parts.

URL Landing Page Efficiencies

It is even more important for an organization’s marketing landing page to be fast as its headquarters lobby to be stylish. More potential and actual customers visit on-line than in person.

It is now common for developers to manage direct URLs to images and other resources, such as this on Amazon cloud:

  • https://d20vrrgs8k4bvw.cloudfront.net/documents/en-US/nd209_Robo_syllabus_v2.pdf

QUESTION: How much difference does using a CDN (Content Distribution Network) make? There are several companies offering the service. Have a public cloud enables test clients to be quickly installed around the world to evaluate customer experience. Several sites track uptime availability and how fast landing pages load from various points in the world:

Google’s Page Speed Insights points out internal issues such as whether images are compressed enough and the many other specific tricks to make the site as fast as possible. testmysite.ThinkWithGoogle.com evaluates mobile through 3G and 4G networks. Google Speed Scorecard compares the speed of various sites in one table.

shows what users see, in slow motion of pages in various stages of completeness.

Impact of language

Websites that expose static HTML files for retrieval run fast because such files are not generated. And the files can be distributed around the world in CDNs (Content Distribution Networks) such as Akamai, AWS CloudFront, CloudFlare, Fastly, etc.

Websites that run WordPress are slower than static sites because WordPress generates HTML files for every request. The programming that does the generation is written in the PHP programming language. PHP is an interpreted language, meaning that PHP programming source code is processed by the PHP interpreter program every time to respond to each new request.

WordPress still is among the most popular programs running on the internet because of its vast ecosystem of developers and add-on functionality. To many, the overhead of PHP is worth the features provided by PHP sites that use SugarCRM, WooCommerce, and many others.

PHP and Python are usually slower than programs written in Java, Go, or other programming language compiled into low-level run-time files that computers execute.

Java programs require the additional installation of a JVM (Java Virtual Machine) that allocates memory among programs. Go comes with its own run-time environment.

Dealing with hardware and patching

The difficulty with both WordPress and compiled applications is that one must setup a server and populate it with the software, then configure it.

Business owners who had a WordPress site built must continue to pay thousands of dollars each year for “maintenance” to avoid falling behind. Patches for operating system security, the PHP interpreter – every aspect of technology – must be updated ocassionally. This constant maintenance does not add additional functionality to the end user, so is feels like a disruption and waste of time and money.

Enter SaaS in a cloud.

SaaS (Software as a Service)

To take advantge of the availability of the internet, in the late 1990’s software vendors such as Salesforce emerged to offer users functionality completely through an internet browser. Such vendors take care of providing the underlying technologies such as operating system software, databases, and the “framework” that enables customization of functionality. SaaS vendors also handle hardware provisioning, making sure to have whatever number of servers are available, behind the scenes, like dining at a fine restaurant.

As with regular use of fine restaurants, SaaS offerings can seem expensive to some, costing thousands of dollars for every user, plus additional costs to hold storage. For example, Salesforce charges $5,000 to store one Terabyte of data each month, compared to a multi-terrabyte USB drive for a $100 dollar one-time purchase.

Those who develop using a framework such as Salesforce must first learn all the intracies of their framework and unique programming language – which can take many months of serious study.

Enter Serverless.

“Serverless”

Since “Severless” software development capabilities was made available by cloud vendors beginning in 2016, it is capturing the fascination of cool application developers because a developer only needs an internet browser (such as Google Chrome) to create and run applications.

The term “serverless” means that developer does not need to hassle with acquiring server hardware. The cloud vendor (Amazon) provides all the processing.

The advantage of Serverless over SaaS is that the programmer can develop individual “functions” with a more “lightweight” framework reusing other’s coding. Functions can be written in several languages (Python, Go, .NET C#, etc.).

But there is a “hassle factor” with Serverless.

Tuning required

With both SaaS, Serverless, and other shared-cloud environments, programmers must ensure that their code does not run too long or take too much memory, lest the cloud vendor issue errors that prevent execution. This is needed both because custom code runs within an infrastructure with others and because costs accumulate for each request made.

Such does not seem as important for those who stand up their own server to host WordPress. Inefficiencies in WordPress configurations and programming are hidden in lower capacity unless investigated.

Single instance hosting

Many web hosting companies have sprung up to offer hosting of executables on the internet. Several charge just $5 a month or less for a small site. Such offerings provide a single “process” for each website. And other websites are on other processes on the same server.

An issue with shared hosting is that several websites on the same server share the same IP address. So if one of the website is marked as being abusive, all other websites sharing its address also become blocked.

The trouble with a single stand-alone instance is that when a gradually increasing load is placed on it, eventually the server would become overloaded and fail. The level of transaction throughput at the point of failure is determined by “stress tests”.

There are several ways to increase capacity on individual servers.

One alternative is to allocate more RAM on the server to cache and buffer transactions within each server.

In order to get a given server to process more load, its hardware components can be upgraded manually. This is called “scaling up”.

Magnetic hard drives are slow. Many times slower than the rate CPUs transfer data. But modern SSD (Solid State Device) drives used today are very fast. (That’s the reason why AWS does not allow the use of magnetic drives as a server boot-up data volume.)

Server manufacturers usually provide more speed along with larger capacity:

  • More cores (vCPUs) and faster CPUs come with larger RAM
  • Faster disk types come with larger capacity disks
  • Faster network performance (speed) interfaces come with larger capacity network pipes

For example, to get a server with more RAM, you also pay for more cores (vCPUs) whether you want it or not.

A doubling of RAM usually costs twice as much, or more. However, upgrading usually doesn’t yield the same increase in how much is processed. For example, a doubling of RAM does not usually yield a doubling of transaction throughput. So one question performance engineers are asked to answer is whether running two smaller servers processes more transactions than a big server with the equivalent memory of several smaller servers.*

BTW, Amazon sells RAM memory by “GiB” (for Gibibyte) rather than the more traditional “Gigabytes” used by hard disk drive manufacturers to mean 1,000,000,000,when using a “base 10” method of counting, where each digit can have 10 values (from 0 thru 9). Counting each digit, that’s 10 to the 9th power.* A Gibitype is based on “base 2” (1 or 0) counting that computers use internally, and 2 to the 30th power which is equivalent to worth 1,073,741,824 bytes in base 10. The difference between the two increases exponentially as numbers get larger: about 7% at the Gibibyte/Gigabyte level but 9% at the Tibibyte/Terabyte level (the equivalent of 1,099,511,627,776 bytes in base 2).

More “advanced” types of servers can be configured to use Single-Root I/O Virtualization (SR-IOV) and Elastic Network Adapters (ENA) which deliver 20 Gbps (Gigabits per second) speed. The logical spread of low-latency instances within a single cluster placement group is defined within a single Availability Zone. BTW Cluster placement groups are defined to ensure that instances in one partition do not share underlying hardware with instances in other partitions.

QUESTION: Are the higher packet per server (PPS) performance the the above enhanced networking mechanisms worth the price? Nodes within the same placement group communicate at the full line rate of 10 Gpbs flows and 25 aggregate without any slowing due to over-subscription. PROTIP: If you are using AWS Direct Connect, private pipes to other data centers are 50 Mbs to 10 Gbps, depending on what you pay for.

Another alternative for increasing throughput is to add a separate caching server (such as Redis, MemcacheD, AWS RDS, or ElastiCache) that tries to respond to requests before they hit the web server or database server. Cache servers typically holds responses in a large amount of memory. But to ensure that money for a caching server is not wasted, the cache hit ratio should be measured when running under simulated load.

“After you have identified your architectural approach, you should use benchmarking and load testing data to drive your selection of resource types and configuration options” – 512 in Amazon’s “Performance Efficiency Pillar: AWS Well-Architected Framework (AWS Whitepaper)

The potential for failure due to load may not be of concern for “vanity” websites which don’t anticipate a lot of traffic.

But most businesses websites prefer their websites to be able to handle more business without much manual vigilence.

The Business Objective

The big takaway from this line of thinking is that here we focus on the business objective of obtaining the safest way to achieve the highest rate of business transactions at the least total cost.*

The total cost calculation should include the cost of dissatisfied customers who cannot reach the website or abandon the site (and not buy) when it’s too slow due to it being overloaded.

Total costs also includes the time to build and maintain the software.

And also for testing.

Load Testing

A business can’t wait for production (paying) users to generate the load to see if the system really works because then it would be too late.

So during development, special programs and programming artificially generate load by pretending to be real users. Such programs include JMeter, LoadRunner, Neotys, Flood.io, etc. Load testing programs replace the clients they emulate.

Traditionally, load testing programs are created by capturing communications (HTML files passing between client and server). This is to emulate as many users as possible on each artificial load generator by limitng the scope of client processing.

However, innovations such as http/2 asynchonous communication and AngularJs code running within browsers now require load testing to adapt functional UI testing tools which control each client user’s keyboard and mouse. This approach of running several test instances previously designed for use by an individual tester means less users can be emulated on each load generator machine.

Nevertheless, compared with the negative consequences of business risks, load testing is needed to identify risks that otherwise lay hidden.

Programs that open a new connection with the database to service every user (rather than “pooling” connections for reuse) would require additional memory to be allocated on the database. So load tests are needed to determine optimal configuration settings. By definition, dynamic Security scans are conducted while the system is under load.

Load testing is done to identify errors in design such as memory “leaks” that consume more and more memory over time, requiring each production server to be rebooted. Load testing is needed to determine how often rather than using some arbitrary time like once every night. Some data centers find they need to reboot every hour.

Tests to identify such issues require “soak” test runs. Such long runs can consume a lot of unique data values. So it can be time consuming to manufacture enough data for this purpose. But doing so would enable the testing of conditions in the future when databases grow larger over time. Laws require that data from production be “scrubbed” of personally identifiable information.

Traditionally, load testing occurred near the end of projects, but to enable Agile practices, many businesses today seek to “shift left” (ahead in time) so that risks are exposed as development occurs so that they can be fixed while the code is still in developer’s minds. To facilitate that, load tests (along with monitoring) can be made to automatically begin (by a Continuous Integration utility such as Jenkins) when code is uploaded to a team source repository.

Planning for load testing includes characterizing the load coming from various use cases (how many people registering, browsing, buying, etc. at the same time).

Running servers in the cloud makes performance testing easier and more economical than duplicating the set of production equipment on-premises, which include not just web servers but also utility servers such as DNS, Active Directory/LDAP, etc.

Server images

Many organizations today build all aspects of the server they use by defining programming code “configuration as code” such as Chef, Ansible, Cloud Formation, Terraform, Pulumi, etc. Such an approach include the storage of configuration code in a source version control repository which can retrieve the full set of all files as they were at specific points in the past. Version control systems such as GitHub and GitLab also track who made changes and why (in comment messages).

Server images created by the configuration code can be saved as server images in binary repositories such as Nexus and Artifactory. The server images are used to spin up each server instance. When developers share an image with testers, what is tested is exactly what developers end up with. When testers and operations share an image, what is used in production is what has been tested.

There is another advantage to using server images. For example, Wordpress is written as an open-source application, so anyone can customize it. So various teams have created server images that incorporate a pre-tested set of various components and features such as containing a storefront, or one that has been tuned for efficient and fast running.

There are several different types of server images:

  • AMI (Amazon Machine Images) within AWS (Amazon Web Services)
  • Virtual machine DisK files (VMDKs) running on VMWare or VirtualBox
  • Virtual Hard Disk (VHD) files used with Microsoft Virtual Server and Hyper-V hypervisors
  • Docker containers from DockerHub.com, Quary.io, etc.

All the images except Docker contain the underlying operating system and utilities in each imageß.

Some AMI creators charge its users money. But many pay it because it saves them hassle and time, which mean money.

Is the extra cost worth the extra savings? Load testing can answer that question. QUESTION: To determine the cost of processing using any given server configuration, one needs to measure use of processing, storage, and network data transfers at various levels of user load accessing the server at various points as load increases.*

Such tests enaables load engineering work into profit centers by identify cost savings.

NEXT: Server images are necessary to create multiple instances of the same application.

Multiple instances for elasticity, reliability

If your website is successful in growing visitors, load at peak would grow beyond what a single server can handle.

Then multiple servers would be needed for “elasticity” – the ability to deal with variations in load by adding more resources during high load or consolidating when the load decreases.

Amazon brands several of their services with the name “elastic” to highlight that aspect of their offering.

Multiple servers are also needed to ensure reliability – to have another server take over in case a particular server fails, to ensure “high availability” (“HA” for short).

Fail-over tests measure whether fault tolerance can really occur. Testing that deliberately downs a server to measure the speed of recovery is called “resilency testing”.*

CloudFormation</> templates automate the creation of various components around the creation of a cluster of EC2 servers. An alternative are Terraform specifications which are multi-vendor (Azure, Google, etc. as well as Amazon).

Monitoring

On AWS, to collect measurements and streamed to CloudWatch, a CloudWatch Logs Agent needs to be installed on each server instance.

AWS CloudWatch Log Groups are defined to capture and send alerts about specific errors to SNS (imple Notification Service) emails.

After 60 days, logs can be sent to AWS Glacier for lower-cost longer term retention if a S3 Lifecycle policy is defined.

Load Balancing

When multiple server instances are involved, a Load Balancer is needed to balance (distribute) work among instances. Load Balancers can also use (X.509) SSL/TLS certificates installed to convert “https://” (port 443) encrypted requests to unencrypted “http://” (port 80) requests passed on to web servers. This reduces the decryption and encryption workload on individual servers on the back-end. But some prefer end-to-end security between all servers by generating and installing SSL certs in every server instance.

Some load balancers (such as F5) are specialized hardware (with ASIC chips) to process faster than standard computers. F5 itself, NGINX, Cisco, and others also have software-based load balancers which can be used instead of AWS offerings.

To duplicate a running production instance containing the latest version of all data, first setup EC2 instances to save incremental data snapshots into S3 (for Disaster Recovery). But a volumn in running instance should be briefly stopped and flushed of data before doing snapshots.

Each Elastic Load Balancer (ELB) and EC2 Auto Scaling Group (ASG) keeps its own set of logs to S3 objects. The default is only EC2 status checks. So set S3 bucket Properties > Logging of “aws-bucket-logging” to enabled.

BTW, for higher security, accounts writing logs to S3 buckets are set to write-only, with separate accounts to transfer, read-only, and delete.

To determine whether each instance within an ASG is “OutOfService” and need to be replaced, listeners periodically checks the health of each instance. The frequency between “pings” is set by the “Grace Period” (such as 300 seconds).*

AWS can keep a time-series ELB Access Logs of requests processed by a Load Balancer, which saves response latencies along with time of occurance, client IP address, request paths, and server responses. But they need to be activated at intervals of either 5 or 60 minutes.

AWS does NOT provide an UI to process and present analytics visualization to the logs it stores in S3. So filtering and analytics visualization are done using additional tools:

cloud-perftest-kinesis-643x145-6219

cloud-perftest-sumologic-elb-300ppi-1024x1020

  • Splunk has its custom query language



SQL queries for ELB Logs filters for response codes that are not 200, the time frame of calls, etc.



Trends identified would include the time between acceptance of a connection to the first byte sent to an instance. Timings includes processing of a public key to match the one in the ELB setup with a back-end instance authentication policy.

Bastion Hosts

BTW, when servers behind a firewall use unencrypted traffic, they should not have connection to the public internet. But to obtain files from the open internet, traditionally, a “Bastion host” is setup for administrators (on pre-defined IP addresses). Such a server is the only one that goes through a NAT (Network Address Translation) “Gateway” which hides IP addresses from the outside world.

Once vetted, files needed by application servers are obtained from an internal Network File Share (NFS) or file respository server managed by utility software such as Nexus or Artifactory.

Bation hosts and inbound ports and SSH keys can be replaced by the AWS Systems Manager Session Manager, which also maintains an audit trail. But its IAM setup is tricky.*

A/B Testing

Cloud-based DNS (Domain Name Service) servers (within Amazon’s Route 53 service) resolves IP addresses from host names. It can also allocate a percentage of traffic to different sets of servers for Blue/Green Deployment or A/B testing. Blue/Green Deployment is used to transition users to a new set of an app enviornment for a new version. A/B testing allocates varying percentage of users to variations of an app to compare user reaction/satisfaction.

Instead of directly interacting with Route 53, the switchover can be specified in OpsWorks and Elastic Beanstalk consoles or via Cloud Formation templates This

Time to Additional Capacity

The concern with scaling is QUESTION: how quickly additional capacity is added or removed before/after need?

The traditional on-premises approach is to order and buy excess server hardware based on projected peaks many months or years in advance. Thus, servers would use a fraction of their capacity, which remains unused much of the time. And if processing volume exceeds the peak, the whole system would degrade or fail. In a cloud, although capacity can be added dynamically, it needs to be added slightly before need to provide a margin to handle growth while additional instances are brought up. Bootstrapping instances in ASG can take 10 minutes or more. To avoid false alarms from being in “pending:complete” state before bootstrapping completes, create an ASG Lifecycle Hook to hold instance in a “pending:wait” state until bootstrapping completes. Hooks time out after 60 minutes. But an API call in the bootstrapping script can release the hook.*

aws-ec2-whenever-cpu-34970

A cloud of servers such as Amazon AWS pools unused capacity for allocation when needed.

When spinning up AWS EC2 (Elastic Compute Cloud) server instances, there is a concern about how quickly additional capacity can be added. Currently, it can take 20 minutes or more between the request and when a new server being able to process application transactions. It helps to track the actual time in order to design auto-scaling settings.*

So some operators define one or more “standby” server instances to instantly process sudden increases in load while additional servers spin up. The number of such servers are determined by “spike tests” which emulate sudden increases in load.*

TODO: The complex way that AWS charges for disk drives (input/output) make spike tests useful to determine real costs.

TODO: Auto Scaling Limits

Affinity Groups

A big concern with measurement during load testing is the time between client request and (the first byte of) response from server. Time over the network is both significant and can take up 75% of the total response time. To eliminate that time, ideally, load generators would be next to web servers. That would enable accurate diagnosis of response times purely on the server (and underlying services).

On AWS, an “affinity group” setting became available in 2018 to keep a set of servers close to each other, to minimize latancy of communication between servers.

One advantage of using a cloud vendor is that they make it easier to distribute traffic across several data centers so that if one center is hit by a disaster, a stand-by center can take over. Amazon calls them different “Availability Zones”. Amazon makes two or more “AZ’s” available for each of several dozen “Regions” around the world.

But does that really work and how much time does it take to switch between availability zones? That’s the job of “fail-over tests”.*

The redundancy of hosting and syncing data across several regions is more complex and costly than hosting across several Availability Zones.

Hosting across zones require use of multiple network VPC (Virtual Private Cloud) settings that define network security settings used.

TODO: Detailed comparison of various cloud vendor service names and offerings (Amazon, Azure, Google, Alibaba, etc.)?


References:

Based on the Deep Lizard’s AWS - Amazon Web Services EC2 Management video series from November 2017.

Auto-scaling

Amazon continues to offer traditional elastic load balancer service with auto-scaling groups of individual servers. The service is controlled using Chef specifications.

The concern with clusters of executable programs is sticky sessions which stay on a particular server instance until time-out, which can be several hours. Meanwhile, that particular server instance cannot be downed for security updates, memory reclaimation, or whatever. In other words, it takes a long time to “bleed” instances of user sessions. This situation is caused by programs that was written to depend on the exchange of cookies in HTTP headers exchanged between client and server. With such an architecture, Load Balancers need to return a client to a specific server instance, and thus not “stateless”.

Apps that are “stateless” can better take advantage of advanced scaling features.

Sticky vs. Stateless (more scalable and cheaper)

Apps need to be “stateless” in order to make use of server instances than can disappear at any time, such as AWS EC2 Spot instances purchased according to “spot rates” which fluctuate under an aucton system established by Amazon. Such rates are usually the lowest cost among all ways of charging. Thus, a system can be considered financially defective if it cannot take advantage of the lowest cost instances. Such a situation can and should be identified during technical planning stage. That is the rationale for considering performance issues early on rather than shortly before production when nothing much can be changed.

Speaking of “sticky”, there are sticky service charges …

Automation to avoid runaway bills

One of the risks with being able to get a lot of capacity quickly is that bills can pile up just as quickly, and sometimes inexplicably. Runaway bills are a concern when using clouds.

For example, when I kept being charged $35 a month on an account I provisioned server instances I shortly terminated, investigation by Tad Einstein from Google revealed that Google’s shutdown script doesn’t automatically remove Forwarding rules created when servers run within a cluster.

gcp-forwarding-rule-703x261-34213.png

To delete Forwarding rules in a Bash script:

gcloud compute forwarding-rules delete "$FORWARDING_RULE"

To obtain the FORWARDING_RULE value, one can get a list manually via the UI at Networking -> Load Balancing -> advanced options -> Forwarding rules. Alternately, this command lists them:

RESULT=$(gcloud compute forwarding-rules list)

The RESULT variable above captures the list of forwarding rules created. If there is a possibility that there are several, it is necessary to select the specific rule to delete. So ideally you would build up the whole environment each time so there is no question there is no lingering rules.

Automation options

The advice here is to run cloud using automated scripts so that commands such as the above can be inserted when needed.

AWS has its CloudFormation YAML declarative specifications are “configuration as code”, stored in a version control repository such as GitHub, which enables fall-back to the complete set of files at various points back in time. Puppet then puts instances into a specific state.

Hashicorp’s Terraform equivalent HCL (which adds comments to YAML) is “multi-cloud” (stands up instances in AWS, Azure, GCP, etc.).

There are some differences in settings during testing vs. during production. For example, production Auto-Scaling Termination Policies would use “ClosestToNextInstanceHour” to save some money for Windows instances which are charged by the hour rather than Linux instances which are charged by the minute. But when testing a new launch configuration, it may be easier to terminate “NewestInstance” first.

Also to enable multi-cloud capability, some companies put their public-facing load balancers in their own data centers, then route to the cloud of their choice. QUESTION: How much latency does that introduce?

Monitoring granularity and fidelity

Automated monitoring and alerts replace the need for constant human vigilence, so you can sleep better at night rather than worrying.

Some organizations prefer to automate all aspects of setting up computing capabilities – installing the operating system, drivers, etc. This enables the organization to quickly respond to “zero day” security vulnerabilities which can crop up in any part of a system. This would also enable the organization to take advantage of lower prices for “bare metal” server instances from IBM and (since 2018) AWS. But is the total cost of running bare-metal boxes really cheaper than other approaches?

The default granularity of AWS monitoring service (CloudWatch) is one datapoint every 5 minutes, and does not include monitoring of memory usage. Monitoring of memory usage and granularity of 1 minutes can be configured (at additional cost). But that still doesn’t cover situations where sub-second ganularity is needed to inform debugging of “micro events”.

To save on disk space, many monitoring vendors sample readings from among servers, taking perhaps just 1% of all readings captured. This would reduce the fidelity of a specific server even more.

To further save on disk space, many traditional monitoring utilities truncate data of more granular detail over time. For example, individual data points collected are deleted after a week. Some keep just the average of each day’s measurement. This is not a useful practice for helping with debugging issues over time. A compromise is to calculate and store, in addition to averages, 90th or 95th percentile calculations.*

So when there is a cluster of machines, use general metrics to determine whether they are all using comparable amounts of CPU, memory, etc.. (An example of such a metric is the running Coefficient of Variation (CV) obtained by dividing the standard deviation into the average.)

More granular metrics on just one of the servers within a cluster can then be used. This would reduce disk space usage for metrics. This would also provide an indicator of the impact of adding more grandular measurements to a machine.

On the metrics dashboard, one line representing whether all servers are at a similar level of load can replace a graph containing separate lines for each server. Taking that further, one line can represent whether all metrics about a cluster are “nominal” can replace a whole set of lines about each metric about a cluster. That’s kinda like a person’s FICO (finacial) score that consists of several aspects of credit trusworthiness.

QUESTION: How much time elapsed between alarm and reponse? This would involve recording events in a database, with analytics on that database. Within AWS, CloudWatch would store a new row within RDS.

Furthermore, an email, SMS text, or Slack notification can be sent out when a thresholds or events occur. Within AWS, send SNS notifications when an Auto Scaling groups launches or terminates instances

To get ahead of events, how long could the alarm event could be predicted? That’s where ratios might be used.

AWS CloudTrail logs report configuration changes such as which requests were made, the source IP addresses where the requests came from, who made the request, when the request was made, etc.


TODO: Complete this article:

AWS Elastic Beanstalk to deploy apps

Instance limits.

Blue-Green Deployments

A/B testing

AWS Lightsale

In 2018 Amazon introduced its Lightsail service, which automatically scales EC2 instances running executables without the need to setup VPCs and auto-scaling groups. And rates are comparable to public hosting companies (starting at $5 per month).

Each Lightsale plan has a limit beyond which additional storage and data transfer costs would be incurred.

Among Linux Academy’s diagrams

TODO: Serverless

Istio and Envoy for Tracing

Emerging since 2018 technology

https://wilsonmar.github.io/service-mesh

Tracing

Control plane

Project Management Life Cycle (PMLC)

Recap of tasks

Here is a list of tasks mentioned above, in usual sequence of execution:

  1. programmers ensure that their code does not run too long or take too much memory.

  2. Scalability tests: Does running two smaller servers processes more transactions than a big server with the equivalent memory of two smaller servers.

  3. What yields the highest rate of business transactions at the least total cost?

  4. How quickly additional capacity is added after a request.

References