Automate High Availability in the cloud
Overview
- Human interaction
- Launch
- Advanced User Data
- Boot-up
- Security
- Retrieve using help scripts
- DNS
- EC2 HPC Placement Groups
- VPC (Virtual Private Cloud)
- Security Groups
- Bastion Host
- EBS
- Glacier
- DNS (Domain Name Service) Route 53
- ELB (Elastic Load Balancer)
- Elastic IPs
- ACL
- NAT
- VPN (Virtual Private Network)
- Auto-Scale
- Rolling Updates
- CloudWatch
- CloudFront
- Direct connect
- Resources
- Shell script
- More on Amazon
This tutorial focuses on both the automated setup of multi-stage (dev+QA+prod) enterprise environments within AWS by specifying CloudFormation in the sequence needed during manual configuration on the AWS Management Console.
There are several ways to setup enterprise environments within AWS, listed from the most manual (most difficult) to the easiest (most automated):
- Manually typing in the AWS Management Console
- Interactively use AWS CLI (Command Line Interface) to type commands with parameters
- Create shell scripts calling AWS CLI commands to specify the sequence of construction
- Create CloudFormation templates which declare what Amazon creates
- Use Amazon’s Elastic Beanstalk or Opsworks (Chef) deploy services.
TODO: Make this diagram into a video:
NOTE: This page is in draft form at the moment.
Human interaction
The two main types of people interacting with AWS are 1) Administrators who define the environment and 2) end-users of the whole setup.
My AWS basic on-boarding tutorial describes setting up an account, use of the AWS Management Console, and installation of AWS Command-line Interface.
This tutorial shows how to configure CloudFormation JSON for each aspect of AWS for an enterprise, by sequence of dependencies:
Launch
-
Instead of entering EC2 in the Management Console, In the AWS Management Console Services gallery, click Cloud Formation.
-
Select the Region.
PROTIP: New services are supported in a single region initially. So check what region depending on the services it supports.
-
Click Create New Stack.
NOTE: One CF template can be used to create multiple stacks of servers.
CF can span two or more Availability Zones in a multi-subnet VPC.
- In the Stack Name box, type “Lab” or other name.
- Select Specify an Amazon S3 template URL.
-
Paste the URL. An example is provided at:
PROTIP: With such complexity, better use a JSON file which contains all the resource specifications.
http://us-east-1-aws-training.s3.amazonaws.com/self-paced-lab-15/VPC1.json
This one has additional security groups and output parameters:
http://us-east-1-aws-training.s3.amazonaws.com/self-paced-lab-15/VPC2.json
- Click Next.
- On the Specify Parameters page, leave the default settings.
- Click Next in the Options page (no Tags).
- Click Create after reviewing.
- Click Events tab.
- Click Refresh occassionally.
- On the Services menu, click VPC to see results.
To be able to delete the stack, first turn off Termination Protection:
- Select EC2 from the Services gallery.
- Click Instances.
- Right-click on the running instance to Change Termination Protection.
- Click Yes, Disable.
- Now back on Cloud Formation service.
- Select the box for the stack to be deleted.
- Click Delete Stack.
-
Click Yes, Delete.
NOTE: Unlike using the manual Console, AMIs are
specified in the CloudFormation template JSON.
CF Front Matter
This section is based on the Template Reference at https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/template-reference.html.
All CF templates must have a version, even though it’s always “2010-09-09”:
PROTIP: Specify a URL to documentaiton and discussion about the template.
PROTIP: Use #tags as metadata in the Description to automate search among multiple files.
NOTE: CloudFormation follows an “idempotent” approach of specifying what is desired (an end state) rather than specific sequences of actions.
A script that adds resources would create another new resource each time it is run.
So no matter how many times the specification is run, the end result is the same.
Jinja2 like Jackyll yaml
Jinja2 templates (described at http://jinja.pocoo.org/docs/dev/templates) can be used to expand “moustache” variables in CloudFormation JSON:
- ** stage ** : The name of the stage where you are applying your project (dev, test, prod, etc.).
- ** aws_region ** : The name of the AWS_REGION where you are applying your project.
- ** aws_account_id ** : The ID of the account that you are using to apply your project.
- ** env ** : All available environment variables.
- ** ip-xxx ** : A specific IP address
Advanced User Data
The “User” here refers to you, the DevOps person using CloudFormation, not the ultimate end user of the system being built.
The EC2 Advanced User Data field contains script code that “boostraps” a server, executing after the server becomes active.
This example is a Windows machine running the Powershell iex command to download and install the Chocolaty package manager for Windows machines used here to install Python. Lastly, the awscli is installed.
PROTIP: Many enterprises reference Artifactory to obtain installers instead of whoever at Chocolatey. This is to ensure that all packages have been reviewed by internal Security personnel.
An example to bootstrap instance with CloudWatch sample scripts on Linux servers:
CloudWatch Custom Metrics#
The defaults does not include all the metrics needed, such as Disk Metrics. See http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/mon-scripts.html
For example: the AWS CLI script to send a log every 5 minutes about amount of free memory in GigaBytes:
aws cloudwatch put-metric-data --metric-name FreeMemoryGB --namespace Windows --value 5 --region us-west-2 --dimensions 'Name=Server,Value=WIN'
An example to install CodeDeploy agent on Linux servers:
Boot-up
-
Use a text editor to open the boot script for many Linux AMIs:
/etc/rc.local
-
Additional scripts can be added to
run configuration scripts when the instance is launched as the root user.
User Data CloudInit
Amazon provides some AMIs (Ubuntu Linux AMI and the Amazon Linux AMI) which contain a version of CloudInit. CloudInit accepts commands in the EC2 Advanced User Data field to customize instances at launch time.
REMEMBER: The the EC2 Advanced User Data field has a limit of 16,000 bytes.
WARNING: CloudInit takes time to run, which delays auto-scaling.
Dynamically configuring an AMI at startup means you can use a common base AMI for different use cases.
the database connection string
Security
however, we recommend that if you do this, you pass the AWS access key ID and secret key of an AWS Identity and Access Management (IAM) user with limited permissions. You should grant the IAM user only the read access permissions it needs to retrieve the configuration information. You can use AWS CloudFormation to create IAM users, groups, and policies. From a within a single template, you can create a user, set appropriate polices, create an access key ID and secret key pair, and then add the credentials to the instance through the user data. By adding the IAM user in the template, you have a user whose existence is tied to the lifetime of the stack, with each new stack having a separate, unique user.
bootstrap scripts. We strongly recommend that you assign an IAM role to on the EC2 instance when the instance is launched. By using an IAM role, no long-term secrets are defined in the template or stored in the metadata on the EC2 instance. When the IAM role is used, temporary security credentials are created and used to access AWS services such as AWS CloudFormation. These temporary credentials expire after a short time, making it harder to compromise the credentials and reducing the risk and exposure if the credentials are compromised.
Retrieve using help scripts
Within servers, user data is retrieved from the metadata store at constant IP address:
http://169.254.169.254/latest/user-data
NOTE: The same IP address is used for both Linux and Windows.
User data can also be from this command:
/opt/aws/bin/ec2-metadata –help
Use a helper script on Amazon Linux AMIs to extract. CloudFormation helper scripts are installed by default within Amazon Linux AMI at
/opt/aws/bin
from the Amazon Linux AMI yum repository (package name aws-cfn-bootstrap).
CloudFormation helper scripts are available from:
EC2 Instances
An example CF template:
DNS
DNS servers obtain IP address from URL names by forwarding requests it cannot resolve from its own tables.
Clients – called resolvers – make requests of DNS name servers.
Two DNS servers are usually specified (in client machine TCP/IP properties) for load balancing and fault tolerance.
DNS servers refer to 3 types of records to answer 3 types of queries:
- A (host Address) records are used to answer forward lookup of an FQDN (host name) to a specific IP address. The host name to IP address mappings for a zone are stored in the Domain.dns file in the %systemroot%\System32\Dns folder.
- PTR (Pointer resource) records are used to answer a reverse lookup of an IP address to a host name (another DNS domain name location). IP address to host name mappings are in the z.y.w.x.in-addr.arpa file. Create file 1.0.0.127.in-addr.arpa zone file for reverse lookup.
- SRV (Server location) records -- new in Windows 2000 DNS -- are used to locate domain controllers. SRV specifies the server to which a DNS name server forwards when it cannot resolve a query. Windows 2000 server requires DNS to locate domain controllers. On Windows 2000, DNS is installed as a Windows component on a domain controller with a static (not dynamic) IP address.
Other types of resource records:
- NS records notate which DNS servers are designated as authoritative for the zone.
- SOA (Start Of Authority) records indicate the name of origin and other basic properties for each zone, including the name of the primary server for the source for information about the zone,
- CNAME (Canonical name) records define aliases.
- MX (Mail exchanger) records define the owner and mail exchange server DNS name, with preference number.
Availability zones
Unlike the Console web page, which shows the current Availability Zone in the upper right corner, within CLI you use a command:
<pre><strong>
ec2-metadata -z
</strong></pre>
EC2 HPC Placement Groups
-
From the EC2 Management Console, High Performance Computing placement groups for low latency and high bandwidth within each availability center must be unique within each AWS account.
EC2 instances can’t be moved into a placement group. They most be created all at the same time.
Mapping of AMIs by Region
Images of what is in a server containing all the software.
Each AMI is created by taking a snapshot of what has been configured on a server.
In CF static mapping for key “AWSRegionArch2AMI”, each AMI value is specific to a unique region key:
Mapping of Instance Types
Within the Console, the type of machine are Instance Types.
In a CF JSON file, instance types are defined in Mappings:
NOTE: Nowdays, 64-bit servers are all that is being used.
QUESTION: Why is this necessary if all the values are the same?
VPC (Virtual Private Cloud)
Security Groups
SGs define which ports are open.
PROTIP: Agree on a naming convention for security groups, perhaps using codes not also used by Amazon. Examples: “US_VA5_LX_WEB_P_001” and “IR_2_W12_DMZ_F_002”:
- “US” for the United States legal domicile entities. “G” for Germany, etc.
- “VA5” for Virginia 5.
- “LX” for Amazon Linux, “W2” for Windows Server 2012, etc.
- “WEB” for web server tier, “DMZ”, “SQL” database, “NOS” for NO-SQL database, etc.
- “P” for prod, “F” for functional testing, “C” for capacity test environment.
- “001” a sequential number, zero-filled to ensure proper sorting over time.
Public-facing NAT should be protected with Multi-factor authentication (MFA).
* SSH and RDP ports should open only on sources and destination IP's,
not global network (0.0.0.0/0) nor static exit IP's not dynamic exit IP's.
* http://www.howtogeek.com/121650/how-to-secure-ssh-with-google-authenticators-two-factor-authentication/
* http://www.rohos.com/2013/02/google-authenticator-windows-login/
PROTIP: “Tier” security groups so servers on each tier cannot access all ports. Don’t use same security group for multiple tiers of instances.
By default, no ports are open (all ports are blocked).
- DNS
- ICMP for pings. But only for internal, not external servers.
A template can have additional output parameters.
This advice:
“People have tendency to open for port 8080 to 10.10.0.0/24 (web layer) range. Instead of that, open port 8080 to web-security-group. This will make sure only web security group instances will be able to contact on port 8080. If someone launches NAT instance with NAT-Security-Group in 10.10.0.0/24, he won’t be able to contact on port 8080 as it allows access from only web security group.”
FTP
Bastion Host
Bastion hosts are used to limit exposure to the internet, to enable sysadmins to SSH into machines.
Is many setups, bastion hosts are the only servers allowed access from the public internet.
Windows users use Putty program from:
http://the.earth.li/~sgtatham/putty/latest/x86/putty.exe
Public and private keys:
* Windows users download the PPK.
* Linux/Mac users download the PEM.
<a name”S3Template”></a>
S3 Template URL
For environments routinely processing more than 100 images per second,
because S3 stores files lexicographically (alphabetically),
S3 GETs can be faster if file names are prefixed with a random string (as in a GUID)
or reverse the keyname string.
<a target="_blank" href="https://media.amazonwebservices.com/AWS_Storage_Options.pdf">
PDF: AWS Storage options</a>
- Bucket Policies.
- MFA Delete
- Backup data in another bucket/account
AWS offers several Gateways to store data:
- Gateway-cached volumes
- Gateway-Stored volumes
- Gateway-Virtual Tape Library (VTL)
EBS
Each EBS volume is attached to one instance at a time.
EBS can be transferred among availability zones.
EBS volumes are replicated across multiple servers in an AZ. But a backup is still needed because EBS is designed to have an annual failure rate (AFR) of 0.1% to 0.2% (SLA of 99.95%).
Glacier
AWS allows up to 1,000 vaults per account.
Individual archives can be up to 40 TB each.
Lifecycle retention policies.
Google Nearline Storage
DNS (Domain Name Service) Route 53
Customers and advertisers are given a domain name such as amazon.com, acme.com, microsoft.com, etc.
Visitors specifying the domain name go to the DNS server each has configured on their machine.
Those DNS services got the DNS ANAME record propagated to them from AWS Route 53.
The enterprise approach is to have the DNS Domain Name Service distribute traffic among two external-facing load balancers, to avoid any single point of failure, however unlikely. Secondary DNS operated by an alternative vendor:
- dyn.com
- google.com
- https://nsone.net/
AWS built Route 53 from the ground up rather than using open source coding. AWS added additional features such as health checks. Route 53 works for VPCs only with private hosted zones.
DNS distributes load among load balancers in a round-robin fashion.
ELB (Elastic Load Balancer)
Load balancers distribute traffic among individual nodes in a cluster.
Clients reach the load balancer via a VIP (Virtual IP) address.
EC2-WatchDog is a simple (bash) script for Amazon EC2 to monitor another node for HA and take over a Virtual IP (VIP) if the service on the other node fails. http://danilop.github.io/ec2-watchdog
PROTIP: The response AWS expects from the Ping Path resource (typically /index.html
) is a 200 HTTP response.
So AWS may consider a server up even if the web server is down if the
container service responds with a formatted “Please try again later” message.
Some use an address outside AWS to distribute load to other clouds (servers in private locations, in Azure, etc.)
- Response Timeout 5 seconds.
- Health Check Interval is the number of seconds AWS waits between health checks 2.
- Unhealthy Threshold
- Healthy Threshold is the number of health checks to report healthy before AWS can consider a server healthy enough to use.
PROTIP: Set Healthy Threshold to 2 checks multiplied Health Check Interval of 2.
Elastic IPs
ACL
Access Control Lists can provide more fine-grained control.
There is a spearate ACL for ingress in and egress out.
ACL use degrades performance because every packet is inspected.
NAT
Instances launched into an AWS VPC aren’t directly accessible from internet, so by default are more secure.
To expose nodes to the public internet, configure NAT rules for use with a public VPC
Network Address Translation enables servers in private subnets to communicate with the public Internet outside the farm.
An example of how NAT is configured in a CF JSON file:
The “m1.small” is defined in Mappings.
In the CF JSON Resources section:
The “EIP” is an Elastic IP to the public.
NOTE: NAT servers are physical appliances, so no SSH key pairs are used to log into the instance.
NAT instances are a Single Point of Failure (SPOF), so monitoring and automated replacement is needed.
VPN (Virtual Private Network)
VPN is not an Amazon invention.
VPN secures the network between data centers. So VPN are said to extend data centers.
The CorporateCidrIP parameter in the CF JSON Parameters section (at the top) is the IP address VPC allowed inbound access.
Cidr (Classless Inter-Domain Routing) is also called “supernetting” because it allows more flexible allocation of Internet Protocol (IP) addresses. Whatever.
Auto-Scale
Amazon charges by the hour, even if the server was used only a few minutes.
The cooldown period to remove servers is 5 minutes (x60 = 300 seconds).
Auto-Scaling Config
Auto scaling launches specific AMI instances.
Create Auto Scaling Group
PROTIP: There is a maximum of 5 in a group being scaled.
The response is:
```
OK-Created AutoScalingGroup
```
Verify Auto Scaling
-
In the AWS Management Console Services EC2 Instances. - Select instance without a name. Refresh.
- Status Checks should say “2/2”.
-
Copy the Public DNS and paste it in a browser.
It should say CPU Load: 0%.
- Click Generate Load.
The condition for auto-scaling is to terminate one of the instances.
Identify the amount of time between when the condition occurs to when the new instance can accept and process requests.
Consider the time sequence:
- CloudWatch aggregation makes data available (60 seconds)
- Auto Scaling Trigger is invoked (polling every 60 seconds)
- New instance is populated (several minutes)
- New instance is placed in Load Balancer.
Tag Auto Scaling Resources
- Click the gear icon to Show/Hide columns.
as-create-or-update-tags --tag "id=lab-as-group, t=auto-scaling-group, k=Name, v=AS-Web-Server, p=true"
Create Auto Scaling Notifications
- This uses Amazon’s Simple Notification Service (SNS). Go to that among AWS Management Console.
- Click Get Started.
- Click Create Topic.
-
For Topic Name, type “lab-as-topic”.
PROTIP: Define a naming convention for topics.
- Click Create Topic.
- Click Create Subscription.
- Click Email for Protocol.
-
Type an email address in the Enpoint box.
PROTIP: Specify a group email.
- Click Create Subscription.
- Specify the Topic ARN.
Auto Scaling Policies
To scale down:
as-describe-scaling-activities --show-long
Rolling Updates
The “immutable infrastructure” philosophy is one doesn’t change servers, even security patches. Instead, one substitutes old server instances with new instances – using rolling updates of small batches.
The CloudFormation UpdatePolicy attribute for Auto Scaling Groups.
Andreas Wittig (@andreaswittig from Stuttgard, Germany, on CodeMentor) and Michael show you:
-
“Automating AWS with CloudFormation” Pluralsight 1 hour 19 minute video course released 5 Apr 2016 refers to a hypothetical admin (“called Adam”) who creates a bastion host as well as a WordPress app server using MySQL.
-
Amazon Web Services in Action Manning book.
CloudWatch
CloudWatch is necessary to detect when instances need to be added or removed.
The conditions can be CPU utilization percentage over a period of time, or something more elaborate.
- In AWS Services, select CloudWatch.
- Click Alarms.
- Create Alarm.
- Search for By Auto Scaling Group on the CloudWatch Metrics by Category page.
- Select AutoScalingGroupName “lab-as-group”.
- Select Metric Name “CPUUtilization”. Next.
- Type High CPU Alarm in Name box. Description.
- In Whenever drop-down list, select exceeds or equals >=50 (50%) for one consecutive period.
- CLick + AutoScaling Action.
PROTIP: A full set of triggers need to include all the components, which include memory, disk space, file handles, available ports, etc.
CloudFront
CloudFront is Amazon’s CDN (Content Delivery Network) where files in S3 (Simple Storage Service) are spread around the world.
Compared with Akamai, CloudFront has no minimum usage costs.
CloudFront is among green icons for Management Tools in the Services gallery.
CloudFront has one Resource Type: Distribution.
Direct connect
One can directly connect an on-premise network to an AWS VPC.
PROTIP: Direct connects have a set static bandwidth, so plan accordingly.
Resources
Shell script
-
First, let’s interactively run a command such as this (but change the region code):
aws ec2 describe-security-groups --region us-west-2
-
Use a text editor to begin defining a shell script.
We define environment variables with values, such as REGION needed for most aws CLI commands.
#!/bin/bash REGION=us-west-2 SGOUT="tmp/sgnifo" aws ec2 describe-security-groups \-\-region $REGION \-\-output text > $SGOUT IFS=$'\n' cat $SGOUT | while read line do
The IFS (Internal Field Separator) adds a “new line” break.
The cat command is piped to a while which reads each line.
case $line in SECURITYGROUPS*) GID=(`echo $line | awk -F"\t" '{print $3}'`) GNAME=(`echo $line | awk -F"\t" '{print $4}'`) ;; IPPERMISSIONSEGRESS*) PROTO="EGRESS" ;;
$3 is for the third position of the output, $4 the fourth position, etc.
Part 3:
IPPERMISSIONS*) FROMPORT=(`echo $line | awk -F"\t" '{print $2}'`) PROTO=(`echo $line | awk -F"\t" '{print $3}'`) TOPORT=(`echo $line | awk -F"\t" '{print $4}'`) ;; esac Done rm $SGOUT
Part 4:
IPRANGES*) CIDR=(`echo $line | awk -F"\t" '{print $2}'`) if [[ "$CIRD" = "0.0.0.0/0" && "$PROTO" != "EGRESS" ]]; then echo "$GNAME, $GID, $CIDR, $PROTO, $FROMPORT, $TOPORT" fi ;;
“fi” is the “end if” of bash scripts.
More on Amazon
This is one of a series on Amazon:
- AWS Cloud Services Comparisons
- AWS Well-Architected Cloud
- AWS Cloud Services
- AWS IAM
- AWS CLI
- AWS On-boarding (GUI, CLI, API)
- AWS Security
- AWS Data Tools
- AWS DevOps (CodeCommit, CodePipeline, CodeDeploy)
- AWS server deployment options
- AWS CDK
- Build load-balanced servers in AWS EC2
- AWS Networking
- AWS Xray
- IoT on AWS
- AWS Lambda
- AWS Lambda