Enterprise-grade secure Zero-Trust routing to replace East-West load-balancing using service names rather than static IP addresses. Enhance Service Mesh with mTLS and health-based APIs in AWS, Azure, GCP, and other clouds running Kubernetes as well as ECS, EKS, VMs, databases, even mainframes outside Kubernetes
Overview
- Most Popular Websites about Consul
- Due to Microservices
- Legacy networking infrastructure mismatches
- Legacy mismatches solved by Consul Editions
- Free Open Source Software Features
- Security Frameworks
- Mitigation Actions
- Benefits of Adoption of Consul aims to yield these benefits:
- BOOK: Consul: Up and Running
- Ways to setup Consul with demo infra
- Demo apps
- Certification exam
- B. On HashiCorp’s Consul Cloud SaaS HCP (HashiCorp Cloud Platform)
- The Automated Way
- Create resources within AWS
- CTS for NIA
- Hashicorp Cloud Account
- Store secrets
- (optional) Configure kubectl
- Create a HashiCorp Virtual Network (HVN)
- Peer HVN to a AWS VPC
- Create a HCP Consul cluster
- Enable a public or private IP
- Configure L3 routing and security ???
- Configure Consul ACL Controller
- Run Consul clients within the provisioned AWS VPC
- Run a demo application on the chosen AWS runtime
- Destroy Consul
- Service Discovery Workflow
- C. On a macOS laptop using Docker
- One Agent as Client or Server
- Install HCDiag
- Install Consul Agent on Linux
- Install Consul Agent on macOS
- Install using Brew taps on MacOS
- Install by Download
- Consul CLI commands
- Ports used by Consul
- Environment Variables
- envconsul
- Consul Templates
- Start Consul Agent in forground
- Start Consul Server in background (macOS)
- Leave (Stop) Consul gracefully
- Consul web GUI
- API
- Chaos Engineering
- Service Graph Intentions
- Services
- (Consul) Nodes (Health Checks)
- ACL (Access Control List) Operations
- D. In a single datacenter (with Kubernetes)
- Kubernetes with Consul
- Sidecar proxy injection
- Service Discovery Registry DNS Queries
- Assist or Replaces Kubernetes
- D. In a single datacenter using Kubernetes
- E. In a single 6-node datacenter (survive loss of an Availability Zone)
- F. For HA on multiple datacenters federated over WAN
- Enterprise configuration
- Manage from another Terminal
- Consul Tutorials from HashiCorp
- G. Integrations to legacy VMs, mainframes, etc.
- Customize HTTP Response Headers
- Collaborations
- Competitors
- References
- HashiCorp Corporate Social
- END
Here are notes while I’m learning about Consul, attempting to be succinct and logically sequenced. All without sales generalizations. All in this one single big page for easy search. This is not a replacement for you going through professionally developed trainings.
Consul is “a multi-cloud service networking platform to connect and secure any service across any runtime platform and public or private cloud”.**
NOTE: Content here are my personal opinions, and not intended to represent any employer (past or present). “PROTIP:” flags information unique to this website, based on my personal research and experience.
Most Popular Websites about Consul
The most popular websites about Consul:
-
The marketing home page for HashiCorp’s Consul:
https://www.consul.io/ -
Wikipedia entry:
https://www.wikiwand.com/en/Consul_(software)“Consul was initially released in 2014 as a service discovery platform. In addition to service discovery, it now provides a full-featured service mesh for secure service segmentation across any cloud or runtime environment, and distributed key-value storage for application configuration.[2]
Registered services and nodes can be queried using a DNS interface or an HTTP interface.[1] Envoy proxy provides security, observability, and resilience for all application traffic.”
-
Detailed technical documentation:
https://www.consul.io/docs -
Tutorials from HashiCorp:
https://learn.hashicorp.com/tutorials/consul/service-mesh -
Technical Discussions:
https://discuss.hashicorp.com/c/consul/29 -
Stackoverflow has highly technical questions & answers:
https://stackoverflow.com/search?q=%23hashicorp-consul -
Reddit:
https://www.reddit.com/search/?q=hashicorp%20consul -
Licensed Support from HashiCorp is conducted using those authorized to access HashiCorp’s ZenDesk system:
https://hashicorp.freshservice.com/helpdesk/tickets
Due to Microservices
“Microservices is the most popular architectural approach today. It’s extremely effective. It’s the approach used by many of the most successful companies in the world, particularly the big web companies.” –Dave Farley
In hopes of building more reliable systems in the cloud faster and cheaper, enterprises create distributed microservices instead of monolithic architectures (which are more difficult to evolve).
Microservices seem like a good idea because it promises:
- Their Ephemeral services enable each service to move and scale independently (reduce dev teams waiting for each other)
- It simplifies unit testing of individual services
- It increases agility
- Greater operational efficiency
Legacy networking infrastructure mismatches
However, each new paradigm comes with new problems.
A common explanation of what Consul does references three technical categories:
The concerns that Consul solves can be categorized thus:
“Consul is a datacenter runtime that provides 1) service discovery, 2) configuration, and 3) orchestration.”
Implementation of microservices within legacy infrastructure and “fortress with a moat” mindset (rather than “Zero Trust” and other security principles) creates these concerns:
Orchestration
A. When traffic is routed based on IP addresses, traffic is sent blindly without identity authentication (a violation of “Zero Trust” mandates).
B. Traffic routing mechanisms (such as IPTables) were designed to manage external traffic, not traffic internally between services.
Service Discovery
C. So mechanisms intended to secure external traffic (such as IPTables) are drafted for use to secure internal traffic among app services. Such mechanisms are usually owned and managed for the whole enterprise by the Networking department. So developers spend too much time requesting permissions for accessing IP addresses. And Network departments now spend too much time connecting internal static IP addresses for internal communications among services when many don’t consider it part of their job.
D. Due to lack of authentication (using IP Addresses), current routing does not have mechanisms for fine-grained permission policies that limit what operation (such as Read, Write, Update, Delete, etc.) is allowed. That implements “Least Privilege” principles.
E. Also due to lack of authentication, current routing does not have the metadata to segment traffic in order to split a percentage of traffic to different targets for various types of testing.
DEFINITION: "Micro segmentation" is the logical division of the internal network into distinct security segments at the service/API level. Its use enables granular access control to, and visiblity of, discrete service interface points. Reference: <a target="_blank" href="https://dodcio.defense.gov/Portals/0/Documents/Library/CNAP_RefDesign_v1.0.pdf">PDF: "US Department of Defense (DoD) Cloud Native Access Point (CNAP) Reference Design (RD)"</a>
The segmentation that "East-West" (internal) Load Balancers with advanced "ISO Level 7" capability (such as F5) can perform is more limited that what Consul can do with its more granualar metadata about each service.
Not only that, Load Balancers are <strong>a single point of failure</strong>. So an alternative is needed which has been architected for resilience and high availability to failures in individual nodes, Availability Zones, and whole Regions.
F. In an effort mitigate the network features lacking, many developers now spend too much time coding network-related communication logic into each application program (for retries, tracing, secure TLS, etc.).
Kubernetes a partial solution
Kubernetes (largely from Google) has been popular as “orchestrator” to replace instances of pods (holding Containers) when any of them go offline.
NOTE: Kubernetes is currently not mature when it comes to adding more pods (to scale up) or removing pods (to scale down).
However, core Kubernetes currently still has these deficiencies:
G. Kubernetes does not check if a service is healthy before trying to communicate with it. This leads to the need for coding of applications to perform time-outs, which is a distraction and usually not a skill by most business application coders.
H. Kubernetes does not encrypt communications between services.
I. Kubernetes does not provide a way to communicate with components and cloud services outside Kubernetes such as databases, ECS, other EKS clusters, Serverless, Observability platforms, etc. Thus, Kubernetes by default does not by itself enable deep transaction tracing.
References:
Legacy mismatches solved by Consul Editions
Consul provides a mechanism for connecting dynamic microservices with legacy networking infrastructure.
The list below send you to how each edition of Consul solves the mismatches described above.
- Free Open Source
- Paid Enterprise for self-installed/managed on-prem or in private clouds
- SaaS in the HCP (HashiCorp Platform) in the cloud
Free Open Source Software Features
The main component of the Consul product – the Consul Agent executable “consul” – can be controlled using CLI commands without licensing as FOSS (Free open-sourced software) using code open-sourced at:
-
https://github.com/hashicorp/consul
Consul written in the Go programming language. The GUI is in JavaScript with Handlebars templating, SCSS, and Gherkin.
Initiated in 2014, this repo has garnered nearly 25,000 stars, with over a million downloads monthly.
References:
- VIDEO: “Consul eliminates load balancers”
- VIDEO: “Using Consul for Network Observability & Health Monitoring” referencing this repo
PROTIP: Here are Agile-style stories requesting use of HashiCorp Consul (written by me):
Consul Concepts in UI Menu
The Consul Enterprise edition menu can serve as a list of concepts about Consul:
“dc1” is the name of a Consul “datacenter” – a cluster of Consul servers within a single region.
Multiple “Admin Partitions” and “Namespaces” are Consul Enterprise features.
Consul manages applications made available as Services on the network.
Nodes are Consul servers which manage network traffic. They can be installed separately from application infrastructure.
-
Rather than A. blindly routing traffic based on IP addresses, which have no basis for authentication (a violation of “Zero Trust” mandates), Consul routes traffic based on named entities (such as “C can talk to A” or “C cannot talk to A.”).
Consul Enterprise can authenticate using several Authentication Methods
-
Rather than B. routing based on IPTables designed to manage external traffic, Consul routes from its list of “Intentions” which define which other entities each entity (name) can access.
Consul does an authentication hand-shake with each service before sending it data. A rogue service cannot pretend to be another legitimate service unless it holds a legitimate encryption certificate assigned by Consul. And each certificate expires, which Consul works to rotate.
-
Rather than C. manually creating a ticket for manual action by Networking people connecting internal static IP addresses, Consul discovers the network metadata (such as IP addresses) of each application service when it comes online, based on the configuration defined for each service. This also means that Network people would spend less time for internal communications, freeing them up for analysis, debugging, and other tasks.
Roles and Policies
-
Consul’s Key/Value store holds a “service registry” containing ACL (Access Control List) policy entries which define what operations (such as Read, Write, Update, Delete, etc.) is allowed or denied for each role assigned to each named entity. This adds fine-grained security functionality needed for “Zero Trust”.
As Consul redirects traffic, it secures the traffic by generating certificates used to encrypt traffic on both ends of communication, taking care of automatic key rotation hassles, too. BTW This mechanism is called “mTLS” (mutual Transport Layer Security).
-
Instead of E. requiring a Load Balancer or application coding to split a percentage of traffic to different targets for various types of testing, Consul can segment traffic based on attributes associated with each entity. This enables more sophisticated logic than what traditional Load Balancer offer.
Consul can route based on various algorithms (like F5) “Round-robin”, “Least-connections”, etc.
That means Consul can, in many cases, replace “East-West” load balancers</a>, to remove load balancers (in front of each type of service) as a single-point-of-failure risk.
-
With Consul, instead of F. Developers spending too much time coding network communication logic in each program (for retries, tracing, secure TLS, etc.)</a>, networking logic can be managed in a GUI.
Since Consul is added as additional servers in parallel in the same infrastructure, changes usually involve configuration rater than app code changes. Thus, Consul can connect/integrate services running both on-prem servers and in clouds, inside and outside Kubernetes.
- Within the system, obtain the health status of each app server so that traffic is routed only to healthy app services, so provide a more aware approach than load balancers blindly routing (by Round-Robin).
Partial Kubernetes Remediation using Service Mesh
References:
To overcome G. Kubernetes not checking if a service is healthy before trying to communicate, many are adding a “Service Mesh” to Kubernetes. Although several vendors offer the addition, “Service Mesh” generally means the installation of a network proxy agent (a “sidecar”) installed within each pod alongside app containers.
“Envoy” is currently the most popular Sidecar proxy. There are alternatives.
When app developers allow all communication in and out of their app through a Sidecar proxy, they can focus more on business logic rather than the intricacies of retries after network failure, traffic encryption, transaction tracing, etc.
Although H. Kubernetes does not check if a service is healthy before trying to communicate, Consul performs health checks and maintains the status of each service. Thus, Consul never routes traffic to known unhealthy pods. And so apps don’t need to be coded with complex timeouts and retry logic.
Although I. Kubernetes does not provide a way to communicate with components and cloud services outside Kubernetes, Consul can dynamically configure sidecars such as Envoy to dynamically route or duplicate traffic to “Observability” platforms such as Datadog, Prometheus, Splunk, New Relic, etc. who performs analytics they display on dashboards created using Grafana and other tools.
Paid Enterprise Features
Additional (teamwork and security) features are unlocked with licensing of an Consul Enterprise installed by customer-(self)-managed organizations.
- On the Amazon Marketplace at $8,000 per year for up to 50 nodes and bronze support.
Features:
Tokens
A. Authenticate using a variety of methods. In addition to ACL Tokens, use enteprise-level identity providers (such as Okta and GitHub, Kuberos with Windows, etc.) for SSO (Single Sign On) based on indentity information maintained in email systems, so that addition and deletions of email get reflected in applications immediately.
B. Automatic Upgrades (“Autopilot” feature) of a whole set of nodes at once – this avoids the need for manual effort and elimination of times when different versions exist at the same time.
C. Enhanced Audit logging – to better understand access and API usage patterns. A full set of audit logs makes Consul a fully enterprise-worthy utility.
D. Enable Multi-Tenancy of tenants enabled using “Admin Partitions” as “Namespaces” to segment data into separate different teams within a single Consul datacenter, a key “Zero Trust” principal to diminish the “blast radius” from potential compromise of credentials to a specific partition.
- https://learn.hashicorp.com/tutorials/consul/amazon-ecs-admin-partitions
- Consul on ECS & Admin Partitions Learn Guide
E. Consul can take automatic action when its metadata changes, such as notifying apps and firewalls, to keep security rules current (using NIA CTS).
The “consul-terraform-sync” (CTS) module broadcast changes recognized which can be used to update Terraform code dynamically for automatic resources reconfiguration – This decreases the possibility of human error in manually editing configuration files and decreases time to propagate configuration changes to networks.
F. Policy enforcement using Sentinel extend the ACL system in Consul beyond the static “read”, “write”, and “deny” policies to support full conditional logic during writes to the KV store. Also integrates with external systems
G. Better Resilency from scheduled Backups of Consul state to snapshot files – this makes backups happen without needing to remember to take manual effort.
H. Consul is designed for additional Consul servers to be added to a Consul Cluster to achieve enterprise-scale scalability. The performance scaling mechanism involves adding “Redundancy Zones” which only read metadata (as “non-voting” nodes).
- Large enterprises have up to 4,000 microservices running at the same time.
- “Performance begins to degrade after 7 voting nodes due to server-to-server Raft protocol traffic, which is expensive on the network.”
- Global Consul Scale Benchmark tests (below) proved Consul’s enterprise scalability.
I. Consul Service Mesh (also called Enterprise “Consul Connect”) enables a Kubernetes cluster to securely communicate with services outside itself. Connect enables communication between a Sidecar proxy in Kubernetes to reach an API Gateway (which acts like a K8s Sidecar proxy) surrounding stand-alone databases, ECS, VMs, Severless, even across different clouds.
As with HashiCorp’s Terraform, because the format of infrastructure configuration across multiple clouds (AWS, Azure, GCP, etc.) are similar in Consul, the learning necessary for people to work on different clouds is reduced, which yields faster implementations in case of mergers and acquisitions which require multiple cloud platforms to be integrated quickly. VIDEO
J. Consul can be setup for Disaster Recovery (DR) from failure to an entire cloud Region. Consul has a mechanism called “WAN Federation” which replicate service metadata across regions to enable multi-region capability.
Fail-over to a whole Region is typically setup to involve manual intervention. However, the use of Consul Service Mesh with Health Checks would enable automated failover within the context of a SOC (Security Operations Center)Governance Model.
References:
Multi-region redundancy using complex Network Topologies between Consul datacenters (with “pairwise federation”) – this provides the basis for disaster recovery in case an entire region disappears.
The above features enable a cluster of Consul servers for Enterprises to provide both Highly Availability (fault tolerance) to whole Availability Zone failure
Within a single datacenter, Consul can be setup (using a combination of Service Mesh and Health Check) to provide automatic failover for services by omitting failed service instances from DNS lookups and by providing service health information in APIs.
s which has duplicate nodes by replicating metadata across availability zones and regions.
References:
- https://hashicorp-services.github.io/enablement-consul-slides/markdown/architecture/#1
- Consul’s network coordinate subsystem
Security Frameworks
This section provides more context and detail about security features of Consul.
There are several frameworks which security professionals use to organize controls they install to prevent ransomware, data leaks, and other potential security catatrophes. Here are the most well-known:
- Well-Architected Framework
- “Zero Trust” in CIA
- “Kill Chain”
- ATT&CK Enterprise Framework
- SOC2/ISO 27000 attestations
PROTIP: Within the description of each framework, links are provided here to specific features which Consul provides (as Security Mitigations).
Well-Architected Framework (WAF)
A “Well-Architected Framework” is referenced by all major cloud providers.
- https://wa.aws.amazon.com/wat.pillar.security.en.html
The security:
Security professionals refer to the “CIA Triad” for security:
- Confidentiality by limiting access
- Integrity of data that is trustworthy
- Availability for reliable access
Zero-trust applies to those three:
-
Identity-driven authentication (by requester name instead of by IP address)
-
Mutually authenticated – both server and client use a cryptographic certificate to
-
Encrypt for transit and at rest (baked into app lifecycle via CI/CD automation)
-
Each request is time-bounded (instead of long-lived static secrets to be hacked)
-
Audited & Logged (for SOC to do forensics)
References:
- VIDEO: “The six pillars of Zero Trust”
- US NIST SP 800-207 defines “Zero Trust Architecture” (ZTA) at PDF: https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-207.pdf (50 pages)
- INSTRUQT Consul: Zero Trust Networking with Service Mesh”
zzz
The “Kill Chain” (created by Lockheed-Martin) organizes security work into the 9 stages how malicious actors work.
Specific tools and techniques that adversaries use (on specific platforms) are organized within PDF: 14 goals in the “ATT&CK” Enterprise Matrix lifecycle from Mitre Corporation (a US defense think-tank) in 2013.
A comparison between the above:
Kill Chain | Mitre ATT&CK | Mitigations |
---|---|---|
1. Reconnaissance (harvesting) |
Reconnaissance, Resource Development | Authentication |
2. Weponization (exploit of backdoor into a deliverable payload) |
Initial Access, Execution | mTLS |
3. Delivery (into victim) |
Persistence, Privilege Escalation | Audit logs & Alerts |
4. Exploitation (of vulnerability) | Defense Evasion (Access Token Manipulation) | ACL |
5. Installation (of malware) |
Credential Access, Discovery (System Network Connections Discovery), Lateral Movement (Exploitation of Remote Services, Remote Service Session Hijacking ), Collection (Man-in-the-Middle) | Authorization |
6. Command and Control (remote manipulation) | Command and Control (Application Layer Protocol, Web Service, Dynamic Resolution) | Segmentation |
7. Actions on Objectives |
Exfiltration, Impact | DLP (Data Loss Prevention) |
Mitigation Actions
Part of a Cloud Operating Model suite
Consul is part of the HashiCorp “Cloud Operating Model” product line which provides modern mechanisms for better security and efficiency in access and communication processes:
These products are collectively referred to as “HashiStack”.
Consul, Vault, and Boundary together provides the technologies and workflows to achieve SOC2/ISO27000 and “Zero Trust” mandates in commercial enterprises and within the U.S. federal government and its suppliers.
References:
- VIDEO Microservices with Terraform, Consul, and Vault
Zero Trust Maturity Model
HashiCorp’s HashiStack is used by many enterprises to transition from “Traditional” to “Optimal”, as detailed by the US CISA “Zero Trust Maturity Model” at https://www.cisa.gov/sites/default/files/publications/CISA%20Zero%20Trust%20Maturity%20Model_Draft.pdf (19 pages):
Categories of “Defense in Depth” techniques listed in PDF: Mitre’s map of defense to data sources:
- Password Policies
- Active Directory Configuration
- User Account Control
- Update Software
- Limit Access to Resources Over Network
- Audit (Logging)
- Operating System Configuration
- User Account Management
- Execution Prevention
- Privileged Account Management
- Disable or Remove Feature or Program
- Code Signing
- Exploit Protection
- Application Isolation and Sandboxing
- Antivirus/Antimalware
- Filter Network Traffic
- Network Segmentation
- User Training
- SSL/TLS Inspection
- Restrict Web-based Content
Additionally:
-
To prevent Lateral Movement (Taint Shared Content): Immutable deployments (no live patching to “cattle”)
-
IaC CI/CD Automation (processes have Security and Repeatability baked-in, less toil)
-
Change Management using source version control systems such as Git clients interacting with the GitHub cloud
Summary of Use Cases
In summary, use cases for Consul (listed at https://www.consul.io/):
- Consul on Kubernetes
- Control access with Consul API Gateway
- Discover Services with Consul
- Enforce Zero Trust Networking with Consul
- Load Balancing with Consul
- Manage Traffic with Consul
- Multi-Platform Service Mesh with Consul
- Network Infrastructure Automation with Consul
- Observability with Consul
Benefits of Adoption of Consul aims to yield these benefits:
- Faster Time to Market and velocity of getting things done from less manual mistakes
- Reduce cost via tools (operational efficiency through more visibility and automation)
- Reduce cost via people from improved availability (uptime)
- Reduce risk of downtime from better reliability
- Reduce risk of breach from better guardrails (using Sentinel & OPA)
- Compliance with regulatory demands (central source of truth, immutable, automated processes)
BOOK: Consul: Up and Running
Canadian Luke Kysow, Principal Engineer on Consul at HashiCorp, top contributor to hashicorp/consul-k8s, wrote in his BOOK: “Consul: Up and Running”:
“A small operations team can leverage Consul to impact security, reliability, observability, and application delivery across their entire stack —- all without requiring developers to modify their underlying microservices.”
Code for the book (which you need to copy and paste into your own GitHub repo) is organized according to the book’s chapters:
- Service Mesh 101
- Introduction to Consul
- Deploying Consul within K8s (in cloud or minikube for automatic port-forwarding) and on VMs
- Adding Services to the Mesh
- Ingress Gateways
- Security
- Observability
- Reliability
- Traffic Control
- Advanced Use Cases
and Discord server for the book)
The above are used for showing Proof of Value (POV) from product/workflow adoption.
- https://www.consul.io/docs/intro
- https://learn.hashicorp.com/well-architected-framework
YouTube: “Getting into HashiCorp Consul”
VIDEO: Consul Roadmap – HashiConf Global 2021
Ways to setup Consul with demo infra
PROTIP: Become comfortable with the naming conventions used by the architecture, workflows, and automation by building several environments, in order of complexity:
By “use case” (Sales Plays):
A. There is a public demo instance of Consul online at:
https://demo.consul.io/ui/dc1/overview/server-status
B. On HashiCorp’s Consul SaaS on the HCP (HashiCorp Cloud Platform):
- QUESTION: You can use Consul this way with just a Chromebook laptop???
- Use this to learn about creating sample AWS services in a private VPC using Terraform, createing a HCP account, cloud peering connections across private networks to HVN, day-to-day workflows on https://cloud.hashicorp.com/products/consul
- On AWS or Azure
C. On a macOS laptop install to learn Consul Agent with two nodes (to see recovery of loss from a single node):
- Use automation to install the Consul agent along with other utilities needed
- Use this to learn about basic CLI commands, starting/stopping the Agent, API calls, GUI menus using a single server within a Docker image
- Follow a multi-part video series on YouTube to install and configure 5 Consul nodes in 3 Availability Zones (AZs) within a single region, with app Gateways, Sidecar monitoring
E. In a single 6-node datacenter (with Nomad) to survive loss of an Availability Zone
- Use this to learn about manual backup and recovery using Snapshots and Enterprise Snapshot Agents,
- Conduct Chaos Engineering recovering failure of one Availability Zone
- Telemetry and Capacity proving to identify when to add additional Consul nodes
F. For multiple datacenters federated over WAN
- Use this to learn about configuring the Enterprise Autopilot feature for High Availability across multiple regions (which is a major differentiator of HashiCorp Consul), Chaos Engineering.
G. Integrations between K8s Service Mesh to outside database, ECS, VMs, mainframes, etc.
- Discovery to central service registry across several Kubernetes clusters
- Use this to learn about configuring for integrating HashiCorp Consul to work with a Payment processor, integrate with load balancers that isn’t Consul-aware, and across the entire Enteprise landscape of technologies (another major differentiator of HashiCorp Consul)
Other demos:
- https://www.hashicorp.com/resources/getting-started-with-managed-service-mesh-on-aws First Beta Demo of HCP Consul Service Mesh on AWS.
Demo apps
PROTIP: Adapt the samples and naming conventions here to use your own app after achieving confidence you have the base templates working.
- VIDEO: 12-Factor Apps and the HashiStack by Kelsey Hightower (Google)
https://medium.com/hashicorp-engineering/hashicorp-vault-performance-benchmark-13d0ea7b703f
https://cloud.hashicorp.com/docs/hcp/supported-env/aws
https://github.com/pglass/202205-consul-webinar-demo
-
HashiCorp-provided demo apps included in the practice environments are defined at:
https://github.com/hashicorp-demoapp/
“Hashicups” from https://github.com/hashicorp-demoapp/hashicups-setups comes with a Go library.
-
Consider the HashiCups datacenter which uses both ECS and EKS within AWS:
- Run front-end services task within a ECS (Elastic Container Service) cluster
- Run back-end services task within a EKS (Elastic Kubernetes Service) cluster
See VIDEO “Securely Modernize Application Development with Consul on AWS ECS” by Jairo Camacho (Marketing), Chris Thain, Paul Glass (Engineering)
-
Create the above environment by running Terraform ???
https://github.com/pglass/202205-consul-webinar-demo
https://github.com/hashicorp/terraform-aws-consul-ecs
-
Use HCP Consul for Service Mesh (without Kubernetes)
The Envoy proxy in Data Plane ???
Control Plane to Consul servers within HCP ???
Consul’s Layer 7 traffic management capabilities. ???
ACL Controller
The ACL (Access Control List) Controller is provided by HashiCorp for install within AWS.
To provide least-privilege access to Consul using Terraform and Vault: https://www.hashicorp.com/blog/managing-hashicorp-consul-access-control-lists-with-terraform-and-vault
Observability
REMEMBER: Enterprise editions of Consul is a different binary than OSS edition.
Terraform adds Datadog for Observability.
https://www.pagerduty.com/docs/guides/consul-integration-guide/ shows how to configure Consul-Alerts to trigger and resolve incidents in a PageDuty service. PagerDuty is an alarm aggregation and dispatching service for system administrators and support teams. It collects alerts from monitoring tools, gives an overall view of all of monitoring alarms, and alerts an on-duty engineer if there’s a problem. The Terraform Pagerduty provider is a plugin for Terraform that allows for the management of PagerDuty resources using HCL (HashiCorp Configuration Language).
Certification exam
Because this document aims to present concepts in a logic flow for learning, it has a different order than topics for the Consul Associate one-hour proctored on-line $70 exam at: https://www.hashicorp.com/certification/consul-associate
-
Explain Consul architecture
1a. Identify the components of Consul datacenter, including agents and communication protocols
1b. Prepare Consul for high availability and performance
1c. Identify Consul’s core functionality
1d. Differentiate agent roles -
Deploy a single datacenter
2a. Start and manage the Consul process
2b. Interpret a Consul agent configuration
2c. Configure Consul network addresses and ports
2d. Describe and configure agent join and leave behaviors -
Register services and use Service Discovery [BK]
3a. Interpret a service registration
3b. Differentiate ways to register a single service
3c. Interpret a service configuration with health check
3d. Check the service catalog status from the output of the DNS/API interface or via the Consul UI
3e. Interpret a prepared query
3f. Use a prepared query -
Access the Consul key/value (KV) even though it’s not a popular feature anymore
4a. Understand the capabilities and limitations of the KV store
4b. Interact with the KV store using both the Consul CLI and UI
4c. Monitor KV changes using watch
4d. Monitor KV changes using envconsul and consul-template -
Back up and Restore [BK
5a. Describe the content of a snapshot 5b. Back up and restore the datacenter
5c. [Enterprise] Describe the benefits of snapshot agent features -
Use Consul Service Mesh
6a. Understand Consul Connect service mesh high-level architecture
6b. Describe configuration for registering a service proxy
6c. Describe intentions for Consul Connect service mesh
6d. Check intentions in both the Consul CLI and UI -
Secure agent communication
7a. Understanding Consul security/threat model
7b. Differentiate certificate types needed for TLS encryption
7c. Understand the different TLS encryption settings for a fully secure datacenter -
Secure services with basic access control lists (ACL)
8a. Set up and configure a basic ACL system
8b. Create policies
8c. Manage token lifecycle: multiple policies, token revoking, ACL roles, service identities
8d. Perform a CLI request using a token
8e. Perform an API request using a token -
Use Gossip encryption
9a. Understanding the Consul security/threat model
9b. Configure gossip encryption for the existing data center
9c. Manage the lifecycle of encryption keys
Bryan Krausen provides links to discount codes to his Udemy, “Getting Started with HashiCorp Consul 2022” has 8.5 hours of video recorded at Consul 1.7. It provides quizzes and a >mind-map of each topic and references https://github.com/btkrausen/hashicorp/tree/master/consul
Also from Bryan is “HashiCorp Certified: Consul Associate Practice Exam” three full exams of 57 questions each.
B. On HashiCorp’s Consul Cloud SaaS HCP (HashiCorp Cloud Platform)
Perhaps the fastest and easiest way to begin using Consul is to use the Hashcorp-Managed HashiCorp Cloud Platform (HCP) Consul Cloud. It provides a convenient clickable Web GUI rather than the CLI/API of FOSS (free open-source software).
HCP provides a fully managed “Service Mesh as a Service (SMaaS)” Consul features not provided with the “self-managed” Enterprise edition. That means:
- Monitoring to ensure disk space, CPU, memory, etc. is already staffed
- Capacity testing to ensure configurations are made optimal by specialists
- No risk of security vulnerabilities introduced by inexperienced personnel
- Backups taken care of automatically
-
Restores performed when needed
- Rest from on-going hassles of security patches and version upgrades
- Enable limited in-house IT personnel to focus on business needs.
- Faster time to value and time to market
On the other hand, at of this writing, HCP does not have all the features of Consul Enterprise.
References about HCP Consul:
- https://github.com/hashicorp/learn-hcp-consul
- https://github.com/hashicorp/learn-terraform-multicloud-kubernetes
-
Part 12: HCP Consul [2:18:49] Mar 17, 2022
- HashiCorp’s 7 tutorials on HCP Consul:
- https://www.hashicorp.com/products/consul/service-on-azure
- announced Sep 2020
-
VIDEO: “Introduction to HashiCorp Cloud Platform (HCP): Goals and Components”
- VIDEO: “Service Mesh - Beyond the Hype”
-
hashicorp/consul-snippets Private = Collection of Consul snippets. Configuration bits, scripts, configuration, small demos, etc.
- https://github.com/hashicorp/field-workshops-consul = Slide decks and Instruqt code for Consul Workshops
- https://github.com/hashicorp/demo-consul-101 = Tutorial code and binaries for the HashiCorp Consul beginner course.
-
https://github.com/hashicorp/learn-consul-docker = Docker Compose quick starts for Consul features.
-
https://github.com/hashicorp/terraform-aws-vault A Terraform Module for how to run Consul on AWS using Terraform and Packer
-
https://github.com/hashicorp/hashicat-aws = A terraform built application for use in Hashicorp workshops
-
https://github.com/hashicorp/consul-template = Template rendering, notifier, and supervisor for @hashicorp Consul and Vault data.
-
https://github.com/hashicorp/consul-k8s = First-class support for Consul Service Mesh on Kubernetes, with binaries for download at https://releases.hashicorp.com/consul-k8s/
-
https://github.com/hashicorp/consul-replicate = Consul cross-DC KV replication daemon.
- hashicorp/learn-consul-kubernetes
-
https://github.com/hashicorp/learn-consul-service-mesh
-
https://github.com/hashicorp/consul-api-gateway = The Consul API Gateway is a dedicated ingress solution for intelligently routing traffic to applications running on a C…
-
https://github.com/hashicorp/consul-demo-traffic-splitting = Example application using Docker Compose to demonstrate Consul Service Mesh Traffic Splitting
-
hashicorp/consul-esm = External service monitoring for Consul
- https://github.com/hashicorp/terraform-aws-consul-starter = A Terraform module for creating an OSS Consul cluster as described by the HashiCorp reference architecture.
The Automated Way
- Obtain an AWS account credentials with adequate premissions
- Create an AWS VPC and associated resources to be managed by additional Consul infra
- Identify your lb_ingress_ips used in the load balancer security groups, needed to limit access to the demo app.
- Configure kubectl
- Create a HashiCorp Platform (HCP) cloud account and organization
- Store secrets in a safe way
- Create a HashiCorp Virtual Network (HVN)
- Peer the AWS VPC with the HVN
- Create a HCP Consul cluster
- Configure Consul ACL Controller
- Run Consul clients within the provisioned AWS VPC
- Destroy Consul cluster and app infra under test
Obtain AWS account credentials
-
Obtain AWS credentials (AWS_) and populate file ~/.aws/configuration or environment variables.
export AWS_ACCESS_KEY_ID=your AWS access key ID export AWS_SECRET_ACCESS_KEY=your AWS secret access key export AWS_SESSION_TOKEN=your AWS session token
Alternately, copy and paste credentials in the ~/.aws/credentials file that every AWS CLI command references.
BTW If you are a HashiCorp employee, they would be obtained for the “Doormat” website, which grants access to your laptop’s IP address for a limited time.
Create resources within AWS
There are several ways to setup infrastructure in a cloud datacenter managed by Consul.
Instead of performing manual steps at https://learn.hashicorp.com/tutorials/cloud/consul-deploy, this describes use of Terraform to create a non-prod HCP Consul environment to manage an ECS cluster, and various AWS services:
-
Navigate to where you download GitHub repo.
-
Do not specify –depth 1 when cloning (because we will checkout a branch):
git clone git@github.com:hashicorp/learn-consul-terraform.git cd learn-consul-terraform
-
Before switching to a branch, get a list of the branches:
git tag
git checkout v0.5
-
Navigate to the folder within the repo:
cd datacenter-deploy-ecs-hcp
TODO: Study the Terraform specifications:
- variables.tf - Parameter definitions used to customize unique user environment attributes.
- data.tf - Data sources that allow Terraform to use information defined outside of Terraform.
- providers.tf - AWS and HCP provider definitions for Terraform.
-
outputs.tf - Unique values output after Terraform successfully completes a deployment.
- ecs-clusters.tf - AWS ECS cluster deployment resources.
- ecs-services.tf - AWS ECS service deployment resources.
- load-balancer.tf - AWS Application Load Balancer (ALB) deployment resources.
- logging.tf - AWS Cloudwatch logging configuration.
- modules.tf - AWS ECS task application definitions.
- secrets-manager.tf - AWS Secrets Manager configuration.
- security-groups - AWS Security Group port management definitions.
-
vpc.tf - AWS Virtual Private Cloud (VPC) deployment resources.
- network-peering.tf - HCP and AWS network communication configuration.
- hvn.tf - HashiCorp Virtual Network (HVN) deployment resources.
- hcp-consul.tf - HCP Consul cluster deployment resources.
See https://learn.hashicorp.com/tutorials/consul/reference-architecture for Scaling considerations.
https://learn.hashicorp.com/tutorials/consul/production-checklist?in=consul/production-deploy
-
Identify your IPv4 address (based on the Wi-Fi you’re using):
curl ipinfo.io
{ "ip": "129.222.5.194",
-
terraform.tfvars.example
-
Configure Terraform variables in a .auto.tfvars (or terraform.tfvars) file with, for example:
lb_ingress_ips = "47.223.35.123" region = "us-east-1" suffix = "demo"
region - the AWS region where resources will be deployed. PROTIP: Must be one of the regions HCP suppors for HCP Consul servers.
lb_ingress_ips - Your IP. This is used in the load balancer security groups to ensure only you can access the demo application.
suffix text value AWS appends to resource names its creates. This needs to be changed in each run because, by default, secrets created by AWS Secrets Manager require 30 days before they can be deleted. If this tutorial is destroyed and recreated, a name conflict error will occur for these secrets.
-
Run using terraform init
VIDEO: Try it:
-
In the folder containing main.tf, run terraform to inititate :
terraform init
Example response:
Initializing modules... Downloading registry.terraform.io/hashicorp/consul-ecs/aws 0.2.0 for acl_controller... - acl_controller in .terraform/modules/acl_controller/modules/acl-controller Downloading registry.terraform.io/hashicorp/consul-ecs/aws 0.2.0 for example_client_app... - example_client_app in .terraform/modules/example_client_app/modules/mesh-task Downloading registry.terraform.io/hashicorp/consul-ecs/aws 0.2.0 for example_server_app... - example_server_app in .terraform/modules/example_server_app/modules/mesh-task Downloading registry.terraform.io/terraform-aws-modules/vpc/aws 2.78.0 for vpc... - vpc in .terraform/modules/vpc Initializing the backend... Initializing provider plugins... - Finding hashicorp/hcp versions matching "~> 0.14.0"... - Finding hashicorp/aws versions matching ">= 2.70.0, > 3.0.0"... - Installing hashicorp/hcp v0.14.0... - Installed hashicorp/hcp v0.14.0 (signed by HashiCorp) - Installing hashicorp/aws v4.16.0... - Installed hashicorp/aws v4.16.0 (signed by HashiCorp) Terraform has created a lock file .terraform.lock.hcl to record the provider selections it made above. Include this file in your version control repository so that Terraform can guarantee to make the same selections by default when you run "terraform init" in the future. Terraform has been successfully initialized! You may now begin working with Terraform. Try running "terraform plan" to see any changes that are required for your infrastructure. All Terraform commands should now work. If you ever set or change modules or backend configuration for Terraform, rerun this command to reinitialize your working directory. If you forget, other commands will detect it and remind you to do so if necessary.
-
In the folder containing main.tf, run terraform to design:
time terraform plan
After many minutes, sample response ends with:
Apply complete! Resources: 64 added, 0 changed, 0 destroyed. Outputs: client_lb_address = "http://learn-hcp-example-client-app-1643813623.us-east-1.elb.amazonaws.com:9090/ui" consul_ui_address = "https://dc1.consul.b17838e5-60d2-4e49-a43b-cef519b694a5.aws.hashicorp.cloud"
-
If Sentinel or TFSec was installed:
tfsec
-
In the folder containing main.tf, run terraform to instantiate in AWS:
time terraform apply
-
(optional) Configure kubectl
aws eks --region $(terraform output -raw region) update-kubeconfig --name $(terraform output -raw local.eks_cluster_name) kubectl get pods -A \
-
To access the Consul UI in HCP, print the URL and bootstrap token to access the Consul UI. The bootstrap token can be used to login to Consul.
terraform output consul_public_endpoint_url terraform output consul_bootstrap_token
-
Access the demo application in ECS: print the URL for the demo application:
terraform output ecs_ingress_address
CTS for NIA
HashiCorp’s “Network Infrastructure Automation (NIA)” marketing page (consul.io/docs/nia) promises to scale better, decrease the possibility of human error when manually editing configuration files, and decrease overall time taken to push out configuration changes.
PROTIP: There are current no competitors in the market for this feature.
LEARN: Network Infrastructure Automation with Consul-Terraform-Sync hands-on, which uses the sample counting service at port 9003 and dashboard service in port 9002, from https://github.com/hashicorp/demo-consul-101/releases
-
Intro (using terraform, Consul “consul-terraform-sync” CLI) 17 MIN
- Consul-Terraform-Sync Run Modes and Status Inspection task execution status using REST API. 9 MIN
- CTS and Terraform Enterprise/Cloud integration. 14 MIN
- Build a Custom CTS Module. 20 MIN
- Secure Consul-Terraform-Sync for Production. 13 MIN
- Partner Guide - Consul NIA, Terraform, and A10 ADC. 12 MIN
- Partner Guide - Consul NIA, Terraform, and F5 BIG-IP. 12 MIN
- Partner Guide - Consul NIA, CTS, and Palo Alto Networks. 12 MIN
References:
- https://www.consul.io/docs/nia/configuration
- https://www.consul.io/docs/nia/terraform-modules
- VIDEO by Kim Ngo & Melissa Kam.
- Part 13: Consul-Terraform-Sync
CTS (Consul-Terraform Sync) Agent is an executable binary (“consul-terraform-sync” daemon separate from Consul) installed on a server.
NOTE: HashiCorp also provides binaries for various back releases at
https://releases.hashicorp.com/consul-terraform-sync/
Notice the “+ent” for enterprise editions.
brew tap hashicorp/tap brew install hashicorp/tap/consul-terraform-sync consul-terraform-sync -h
When the daemon starts, it also starts up a Terraform CLI/API binary locally.
See https://www.consul.io/docs/nia/configuration
CTS interacts with the Consul Service Catalog in a publisher-subscriber paradigm.
CTS has Consul acting as the central broker – changes trigger Consul to subscribe to Terraform assets. CTS can respond to changes in Service Registry. CTS can also watch for changes in its KV (Key-Value) store.
When CTS recognizes relevant changes requiring action, it dynamically generates files that invoke Terraform modules. Thus, CTS can interact with Terraform Cloud Driver’s Remote Workspaces. Advantages of this:
- Remote Terraform execution
- Concurrent runs within Terraform using secured variables
- State versions, audit logs, run history with triggers and notifications
- Option for Sentinel to enforce governance policies as code
CTS is how changes can trigger automatic dynamic update of network infrastructure devices such as applying firewall policies, updating load balancer member pools, etc.
- VIDEO: CTS can update network devices that are not Consul-aware (not F5 or NGINX, which are).
- VIDEO: Network Automation on Terraform Cloud With CTS
- CTS is used to keep configurations up-to-date on Fortinet physical and virtual NGFW (Next-Generation FireWall)
- VIDEO: “Future of Service Networking”
CTS v0.3 was announced Sep 2021
References:
- VIDEO “Integrating Terraform with Consul”
- https://learn.hashicorp.com/tutorials/cloud/consul-end-to-end-ecs
Each task consists of a runbook automation written as a CTS compatible Terraform module using resources and data sources for the underlying network infrastructure. The consul-terraform-sync daemon runs on the same node as a Consul agent.
Alternative repo:
Consul Global Scale Benchmark
The biggest way to go is using https://github.com/hashicorp/consul-global-scale-benchmark used to prove that a Service Mesh Control Plane of 5 HashiCorp Consul Servers across 3 availability zones in us-east-1 are able to update 10,000 Consul/Nomad client nodes and 172,000+ services in under 1 second. Each Consul Server run on c5d.9xlarge instance types on EC2 having 36 vCPUs and 72 Gigabytes of memory. It’s described by White paper: “Service Mesh at Global Scale” and Podcast with creator: Anubhav Mishra (Office of the CTO).
See also: https://github.com/hashicorp/consul-global-scale-benchmark = Terraform configurations and helper scripts for Consul Global Scale Benchmark
Identify Terraform repo in GitHub
To create the app infra which Consul works on, consider the
https://github.com/hashicorp-guides
Consistent workflows to provision, secure, connect, and run any infrastructure for any application.
* https://github.com/hashicorp-guides/hashistack
They reference 22 https://github.com/hashicorp-modules such as:
* https://github.com/hashicorp-modules/network-aws
Each module has an examples folder.
https://www.terraform.io/language/settings/backends/remote Terraform Remote State back-ends
https://github.com/hashicorp/field-workshops-consul/tree/master/instruqt-tracks/secure-service-networking-for-aws
a. https://learn.hashicorp.com/tutorials/cloud/terraform-hcp-consul-provider - it provisions resources that qualify under the AWS free-tier.
Files:
- consul.tf: describes the HPC Consul cluster you are going to create.
- vpc_peering.tf: describes the AWS VPC and the peering with the HVN.
- variables.tf: sets the variables for your deployment.
b. The following steps are based on https://learn.hashicorp.com/tutorials/cloud/consul-deploy referencing https://github.com/hashicorp/terraform-aws-hcp-consul which uses Terraform to do the below:
Among https://github.com/hashicorp/docker-consul = Official Docker images for Consul.
https://github.com/hashicorp/terraform-aws-hcp-consul is the Terraform module for connecting a HashiCorp Cloud Platform (HCP) Consul cluster to AWS. There are four examples containing default CIDRs for private and public subbnets:
- existing-vpc
- hcp-ec2-demo
- hcp-ecs-demo
-
hcp-eks-demo
- hcp-ec2-client - [For Testing Only]: installs Consul and runs Consul clients with EC2 virtual machines.
- hcp-eks-client - [For Testing Only]: installs the Consul Helm chart on the provided Kubernetes cluster.
- k8s-demo-app - [For Testing Only]: installs a demo application onto the Kubernetes cluster, using the Consul service mesh.
https://github.com/hashicorp/terraform-azurerm-hcp-consul
Hashicorp Cloud Account
-
Sign into: https://cloud.hashicorp.com/products/consul
- Verify your email if it’s your first time, or type your email.
- The first time, select the Registration Name (such as “wilsonmar-org”), country to create a new org.
-
You get $50! You can skip giving out your credit card until you want a PRODUCTION instance or use larger size node servers. For development use, an extra-small (XS) cluster size is deployed by default to handle up to 50 service instances.
-
Select Consul on the left product menu. Bookmark the URL, which contains your account ID so you’ll go straight to it:
https://portal.cloud.hashicorp.com/services/consul?project_id=…
- Click “Access control (IAM)” menu.
-
Click “Service principals” from the menu and specify the 3 examples below (with your name) for each of 3 roles with ID such as wilsonmar-123456@12ae4567-f584-4f06-9a9e-240690e2088a
- Role “Admin” (as full access to all resources including the right to edit IAM, invite users, edit roles)
- Role “Contributor” (Can create and manage all types of resources but can’t grant access to others.)
- Role “Viewer” (Can only view existing resources.)
PROTIP: Once logged in, a cookie is saved in the browser so that you will be logged in again automatically.
-
For each service principal, click the blue “Create service principal key”.
-
Click the copy icon to save each generated value to your Clipboard (for example):
export HCP_CLIENT_ID=kdNNiD8IbU0FZH8juZ10CgkvE6OvLCZK export HCP_CLIENT_SECRET=6BHGXSErAzsPjdaimnERGDrG9DXBYTGhdBQQ8HuOJaykG9Jhw_bJgDqp35OkYSoA
Alternately, copy-paste the values directly into provider config file:
provider "hcp" { client_id = "service-principal-key-client-id" client_secret = "service-principal-key-client-secret" }
CAUTION: The secret is not shown after you leave the screen.
Store secrets
-
In a file encrypted and away from GitHub, store secrets:
TODO: Use Vault to keep the above secrets secure (in a cloud).
For now, create file config
https://github.com/hashicorp/consul-guides = Example usage of HashiCorp Consul
(optional) Configure kubectl
aws eks --region $(terraform output -raw region) update-kubeconfig --name $(terraform output -raw local.eks_cluster_name) kubectl get pods -A
Create a HashiCorp Virtual Network (HVN)
REMEMBER: Each resource in HCP can only be located in one HVN. You cannot span two different HVNs with a single product deployment, and product deployments cannot be moved from one HVN to another. Additionally, HVNs cannot be changed after they are deployed.
References:
- https://registry.terraform.io/providers/hashicorp/hcp/latest/docs/resources/hvn
Peer HVN to a AWS VPC
- https://registry.terraform.io/providers/hashicorp/hcp/latest/docs/resources/hvn
-
In the HVN overview page, select the Peering connections tab, and click the Create peering connection link.
-
Input the following information:
-
AWS account ID
-
VPC ID
-
VPC region
-
VPC CIDR (Classless Inter-Domain Routers) block
-
-
Click the Create connection button to begin the peering process.
Peering status begins at “Creating”.
-
Accept the connection at the AWS console.
-
Navigate to the Peering Connections area of your AWS Console.
You should have an entry in the list with a status of Pending Acceptance.
-
Click Actions -> Accept Request to confirm acceptance.
Status should change to “active”.
-
Once the HVN is deployed, the status updates to “Stable” on the HVN overview tab.
-
You can return to this screen to delete the peering relationship. However, deleting this peering relationship means you will no longer be able to communicate with your HVN.
Create a HCP Consul cluster
- Enterprise Academy: Deploy a Consul Cluster (Configure, start, and validate high availability of a Consul Enterprise cluster).
- Enterprise Academy: Deploy a Consul Cluster (Configure, start, and validate high availability of a Consul Enterprise cluster).
-
Create Cluster (such as “consul-cluster-1”), Network ID (“hvn”), Region,
CIDR Block 172.25.16.0/20 is the default CIDR block value.
In HVN, IPv4 CIDR ranges used to automatically create resources in your cloud network are delegated in HVN. The CIDR range you use cannot overlap with the AWS VPC that you will be peering with later.
Enable a public or private IP
WARNING: A public IP makes the Consul UI and API conveniently available from anywhere in the public internet for development use. But it is not recommended for production because it is a less secure configuration.
Configure L3 routing and security ???
-
Configure L3 routing and security
-
Create a security group
-
Create a route
-
Define ingress and egress rules
https://learn.hashicorp.com/tutorials/cloud/terraform-hcp-consul-provider
Configure Consul ACL Controller
The Consul ACL Controller is added by Terraform code used to create other app VPC resources.
TODO: Auto-discovery?
Run Consul clients within the provisioned AWS VPC
-
Connect your AWS VPCs to the HVN so that the clients in your VPC can communicate with the HCP server after the next step.
-
Install Consul into those AWS VPC.
This is not in Terraform code???
Run a demo application on the chosen AWS runtime
Destroy Consul
-
Destroy resources
TODO:
References about HVN (HashiCorp Virtual Network):
- https://cloud.hashicorp.com/docs/hcp/network
- https://learn.hashicorp.com/tutorials/cloud/consul-deploy
- https://learn.hashicorp.com/tutorials/cloud/terraform-hcp-consul-provider#hcp_consul_base
Service Discovery Workflow
- Instruqt: Consul F5 Service Discovery
- Enterprise Academy: Service Discovery (See how Consul’s Service Discovery feature works by connecting multiple services)
- Enterprise Academy: Service Discovery and Health Monitoring
HCP Consul Cloud Pricing
https://registry.terraform.io/providers/hashicorp/hcp/latest/docs
https://cloud.hashicorp.com/products/consul/pricing
https://cloud.hashicorp.com/docs/consul#features
Plan | Base | + per svc instance hr | Limits |
---|---|---|---|
Individual Development | 0.027/hr $20/mo | - | Up to 50 service instances. No uptime SLA. |
"Standard" prod. | $0.069/hr $49/mo | Small: $0.02/hr | SLA |
"Plus" prod. | $0.104/hr | - | SLA, multi-region |
PROTIP: Assume a 5:1 node to services ratio.
https://www.hashicorp.com/products/consul/pricing
C. On a macOS laptop using Docker
- https://learn.hashicorp.com/tutorials/consul/get-started-agent?in=consul/getting-started
One Agent as Client or Server
PROTIP: The Consul executable binary is designed to run either as a local long-running client daemon or in server mode.
CAUTION: Do not use the manual approach of downloading release binaries from GitHub because
So that you avoid the toil the configuring PATH, etc. see install instructions below to use a package manager for each operating system (x86 and ARM):
* Homebrew (brew command) on macOS
* apt-get on Linux
* Chocolately (choco command) on Windows
Work with the Consul Agent using:
- CLI (Command Line Interface) on Terminal sessions
- API calls from within a custom program (written in Go, etc.)
- GUI (Graphic User Interface) on an internet browser such as Google Chrome
The API at /connect/intentions/exact provides the most features to create Service Intentions.
REMEMBER: Normally, there is no reason to SSH directly into Consul servers.
The UI and API are intended to be consumed from remote systems, such as a user’s desktop or an application looking to discover a remote service in which it needs to establish connectivity. In addition,
Install HCDiag
-
Install for macOS from Homebrew:
brew install hcdiag
==> Downloading https://releases.hashicorp.com/hcdiag/0.2.0/hcdiag_0.2.0_darwin_amd64.zip ==> Installing hcdiag from hashicorp/tap ==> Caveats The darwin_arm64 architecture is not supported for this product at this time, however we do plan to support this in the future. The darwin_amd64 binary has been installed and may work in compatibility mode, but it is not fully supported. ==> Summary 🍺 /opt/homebrew/Cellar/hcdiag/0.2.0: 5 files, 7.2MB, built in 2 seconds ==> Running `brew cleanup hcdiag`... Disable this behaviour by setting HOMEBREW_NO_INSTALL_CLEANUP. Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`).
-
Verify installation by viewing the help:
hcdiag -h
Usage of hcdiag: -all DEPRECATED: Run all available product diagnostics -config string Path to HCL configuration file -consul Run Consul diagnostics -dest string Shorthand for -destination (default ".") -destination string Path to the directory the bundle should be written in (default ".") -dryrun Performing a dry run will display all commands without executing them -include-since 72h Alias for -since, will be overridden if -since is also provided, usage examples: 72h, `25m`, `45s`, `120h1m90s` (default 72h0m0s) -includes value files or directories to include (comma-separated, file-*-globbing available if 'wrapped-*-in-single-quotes') e.g. '/var/log/consul-*,/var/log/nomad-*' -nomad Run Nomad diagnostics -os string Override operating system detection (default "auto") -serial Run products in sequence rather than concurrently -since 72h Collect information within this time. Takes a 'go-formatted' duration, usage examples: 72h, `25m`, `45s`, `120h1m90s` (default 72h0m0s) -terraform-ent (Experimental) Run Terraform Enterprise diagnostics -vault Run Vault diagnostics -version Print the current version of hcdiag
-
Before submitting a Service ticket to HashiCorp, obtain diagnostics run the HashiCorp utility (originating from ) while a HashiCorp server is running:
hcdiag -dryrun
[INFO] hcdiag: Checking product availability [INFO] hcdiag: Gathering diagnostics [INFO] hcdiag: Running seekers for: product=host [INFO] hcdiag: would run: seeker=stats
-
Configure environment variables to provide the URL and tokens necessary, per this doc.
-
Specify the parameter to specify data desired for each product:
- hcdiag -terraform-ent for for Consul
- Vault
- Nomad
- hcdiag -terraform-ent for Terraform Enterprise.
Warning: The hcdiag tool makes no attempt to obscure secrets or sensitive information. So inspect the bundle to ensure it contains only information that is appropriate to share.
Install Consul Agent on Linux
apt-get update # Install utilities curl, wget, jq, apt-get -y install curl wget software-properties-common jq curl -fsSL https://apt.releases.hashicorp.com/gpg | apt-key add - # Get version: lsb_release -cs # Add the official HashiCorp Linux repository: apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com \ $(lsb_release -cs) main" # Install Consul Enterprise on the node: apt-get -y install consul-enterprise
Install Consul Agent on macOS
-
To setup your mac for Consul, use the approach described in my blog:
https://wilsonmar.github.io/mac-setup
-
Notice there are two options to install the Consul Agent:
brew search consul
==> Formulae consul hashicorp/tap/consul ✔ hashicorp/tap/consul-template consul-backinator hashicorp/tap/consul-aws hashicorp/tap/consul-terraform-sync consul-template hashicorp/tap/consul-esm hashicorp/tap/envconsul envconsul hashicorp/tap/consul-k8s iconsur ==> Casks console
-
Use your mouse to triple-click zsh in the command below to highlight the line, then press command+C to copy it to your Clipboard:
zsh -c "$(curl -fsSL https://raw.githubusercontent.com/wilsonmar/mac-setup/main/mac-setup.zsh)" \ -v -I -U -consul
CAUTION: Do not click on the URL (starting with httpd) since the terminal program opens a browser to that URL.
-v specifies optional verbose log output.
-Golang specifies install of Go programming language development components
-I specifies -Install of utilities XCode CLI, Homebrew, git, jq, tree, Docker, and components in the HashiCorp ecosystem, including Terraform, Vault, Nomad, envconsul.
-U specifies -Update of utilities. Do not specify -I and -U after initial install (to save a few seconds).
Utilities for working with AWS, Azure, GCP, and other clouds require their own parameter to be specified in order to be installed.
-
Press command+Tab to switch to the Terminal.app.
-
Click anywhere in the Terminal window and Press command+V to paste the command from your Clipboard.
-
Press Return/Enter on your keyboard to begin execution.
Install using Brew taps on MacOS
In the script, the Consul Agent is installed using HashiCorp’s tap, as described at:
- https://learn.hashicorp.com/tutorials/consul/get-started-install?in=consul/getting-started
Instead of the usual:
brew install consulor
brew tap hashicorp/tap brew install hashicorp/tap/consulNotice the response caveats from brew install consul:
The darwin_arm64 architecture is not supported for this product at this time, however we do plan to support this in the future. The darwin_amd64 binary has been installed and may work in compatibility mode, but it is not fully supported. To start hashicorp/tap/consul now and restart at login: brew services start hashicorp/tap/consul Or, if you don't want/need a background service you can just run: consul agent -dev -bind 127.0.0.1 ==> Summary 🍺 /opt/homebrew/Cellar/consul/1.12.0: 4 files, 117.1MB, built in 3 seconds
-bind is the interface that Consul agent itself uses.
-advertise is the interface that Consul agent asks others use to connect to it. Useful when the agent has multiple interfaces or the IP of a NAT device to reach through.
Install by Download
PROTIP: Download Enterprise binaries with name ending with “+ent” from Fastly servers at:
https://releases.hashicorp.com/consul/File names containing “SHA256SUMS” are for verifying whether download was complete.
Download “darwin_amd64” files for older Intel MacOS.
Download “darwin_arm64” files for newer M1/M2 MacOS with Apple Silicon. - https://learn.hashicorp.com/tutorials/consul/get-started-install?in=consul/getting-started
- Unzip
- Verify using check sum.
-
Add to $PATH.
Consul CLI commands
Option A: Run Consul in background, which restarts automatically at login:
brew services start hashicorp/tap/consul
Option B: Run Consul in foreground, which occupies the Terminal and does not start again at login:
consul agent -dev -bind 127.0.0.1 -node machine
[DEBUG] agent.router.manager: Rebalanced servers, new active server: number_of_servers=1 active_server="wilsonmar-N2NYQJN46F (Addr: tcp/127.0.0.1:8300) (DC: dc1)"
Alternately,
consul agent -dev -datacenter="aws-1234567890" \ -data-dir=/opt/consul -encrypt="key" \ -join="10.0.10.11,10.1.2.3" \ -bind="127.0.0.1" -node machine
-join will fail if the IP addresses (4 or 6) fails to start.
PROTIP: In production, use configuration file to auto-join:
{ "bootstrap": false, "boostrap_expect": 3, "server": true, "retry_join": ["10.0.10.11,"10.1.2.3"] }
-
TODO: Setup compatibility mode?
-
Verify install:
consul version
Example reponse:
Consul v1.12.0 Revision 09a8cdb4 Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)
-
Obtain the menu of 31 command keywords:
consul
Usage: consul [--version] [--help] <command> [<args>] Available commands are: acl Interact with Consul's ACLs agent Runs a Consul agent catalog Interact with the catalog config Interact with Consul's Centralized Configurations connect Interact with Consul Connect debug Records a debugging archive for operators event Fire a new event exec Executes a command on Consul nodes force-leave Forces a member of the cluster to enter the "left" state info Provides debugging information for operators. intention Interact with Connect service intentions join Tell Consul agent to join cluster keygen Generates a new encryption key keyring Manages gossip layer encryption keys kv Interact with the key-value store leave Gracefully leaves the Consul cluster and shuts down lock Execute a command holding a lock login Login to Consul using an auth method logout Destroy a Consul token created with login maint Controls node or service maintenance mode members Lists the members of a Consul cluster monitor Stream logs from a Consul agent operator Provides cluster-level tools for Consul operators reload Triggers the agent to reload configuration files rtt Estimates network round trip time between nodes services Interact with services snapshot Saves, restores and inspects snapshots of Consul server state tls Builtin helpers for creating CAs and certificates validate Validate config files/directories version Prints the Consul version watch Watch for changes in Consul
Links have been added above.
CLI commands are used to start and stop the Consul Agent.
Ports used by Consul
The default ports, which some organizations change in hope of better security through obfuscation:
-
8300 TCP for RPC (Remote Procedure Call) by all Consul server agents to handle incoming requests from other Consul agents to discover services and make Value requests for Consul KV
- 8301 TCP/UDP for Serf LAN Gossip within the same region cluster for Consensus communication, for agreement on adding data to the data store, and replication of data
-
8302 TCP/UDP for Serf WAN Gossip across regions
- 8500 & 8501 TCP-only for localhost API and UI
- 8502 TCP-only for Envoy sidecar proxy xDS gRPC API (disabled by default)
-
8558 - Consul-Terraform-Sync daemon
-
8600 TCP/UDP for DNS queries
- 21000 - 21255 TCP (automatically assigned) for Sidecar proxy registrations
For bootstrapping and configuration of agent.hcl, see https://learn.hashicorp.com/tutorials/consul/access-control-setup-production
Environment Variables
The shell script I wrote makes use of several custom environment variables, which minimizes mistakes when several commands use the same values. When applicable, my script also captures values output from one step to use in subsequent commands, to avoid the toil and mistakes from manual copy and pasting.
Use of environment variables also enable the same command call to be made for both DEV and PROD use, further avoiding mistakes.
-
DATACENTER1_ID, which is obtained from my laptop’s $(hostname)
-
CONSUL_AGENT_TOKEN
envconsul
- https://www.consul.io/docs/intro/vs
- https://github.com/hashicorp/envconsul
The envconsul utility reads and sets environmental variables from data within the Consul Agent. It is installed when the Consul Agent is created.
-
To launch a subprocess with environment variables using data from @hashicorp Consul and Vault.
envconsul
Consul Templates
Consul-template is a separate binary which reads a template file to substitue variables defined between (“moustashe quotes” ) and replace each with values. An example:
[client] host= port= user= password= # Lease:
Start Consul Agent in forground
-
Use a text editor to customize file /etc/consul.d in .ini format:
[unit] Description=Consul Requires=network-online.target After=network-online.target [Service] Restart=on-failure ExecStart=/usr/local/bin/consul agent -config-dir="/etc/consul.d" User=consul
-
If your Consul Agent is running locally:
consul agent -dev -node "$(hostname)" -config-dir="/etc/consul.d"
-node “$(hostname)” is specified for macOS users: Consul uses your hostname as the default node name. If your hostname contains periods, DNS queries to that node will not work with Consul. To avoid this, explicitly set the name of your node with an environment variable.
Start Consul Server in background (macOS)
Alternately, referencing the environment created:
Because HashiCorp’s Homebrew tap was used to install:
brew services start hashicorp/tap/consul
Alternately, on Linux:
/bin/start_consul.sh
Sample response:
Starting HashiCorp Consul in Server Mode... CMD: nohup consul agent -config-dir=/consul/config/ > /consul.out & Log output will appear in consul.out... nohup: redirecting stderr to stdout Consul server startup complete.
-
Start Consul Server:
systemctl start consul
No message is returned unless there is an error.
Leave (Stop) Consul gracefully
CAUTION: When operating as a server, a graceful leave is important to avoid causing a potential availability outage affecting the consensus protocol.
-
Gracefully stop the Consul by making it leave the Consul datacenter and shut down:
consul leave
QUESTION: No need to specify the node (like in start) because Gossip is supposed to propagate updated membership state across the cluster. That’s “Discovery” at work.
CAUTION: Leaving a server affects the Raft peer-set, which results in auto-reconfiguration of the cluster to have fewer servers.
The command notifies other members that the agent left the datacenter. When an agent leaves, its local services running on the same node and their checks are removed from the catalog and Consul doesn’t try to contact with that node again.
Log entries in a sample response (without date/time stamps):
[INFO] agent.server: server starting leave [INFO] agent.server.serf.wan: serf: EventMemberLeave: wilsonmar-N2NYQJN46F.dc1 127.0.0.1 [INFO] agent.server: Handled event for server in area: event=member-leave server=wilsonmar-N2NYQJN46F.dc1 area=wan [INFO] agent.router.manager: shutting down [INFO] agent.server.serf.lan: serf: EventMemberLeave: wilsonmar-N2NYQJN46F 127.0.0.1 [INFO] agent.server: Removing LAN server: server="wilsonmar-N2NYQJN46F (Addr: tcp/127.0.0.1:8300) (DC: dc1)" [WARN] agent.server: deregistering self should be done by follower: name=wilsonmar-N2NYQJN46F partition=default [DEBUG] agent.server.autopilot: will not remove server as a removal of a majority of servers is not safe: id=40fee474-cf41-1063-2790-c8ff2b14d4af [INFO] agent.server: Waiting to drain RPC traffic: drain_time=5s [INFO] agent: Requesting shutdown [INFO] agent.server: shutting down server [DEBUG] agent.server.usage_metrics: usage metrics reporter shutting down [INFO] agent.leader: stopping routine: routine="federation state anti-entropy" [INFO] agent.leader: stopping routine: routine="federation state pruning" [INFO] agent.leader: stopping routine: routine="intermediate cert renew watch" [INFO] agent.leader: stopping routine: routine="CA root pruning" [INFO] agent.leader: stopping routine: routine="CA root expiration metric" [INFO] agent.leader: stopping routine: routine="CA signing expiration metric" [INFO] agent.leader: stopped routine: routine="intermediate cert renew watch" [INFO] agent.leader: stopped routine: routine="CA root expiration metric" [INFO] agent.leader: stopped routine: routine="CA signing expiration metric" [ERROR] agent.server: error performing anti-entropy sync of federation state: error="context canceled" [INFO] agent.leader: stopped routine: routine="federation state anti-entropy" [DEBUG] agent.server.autopilot: state update routine is now stopped [INFO] agent.leader: stopped routine: routine="CA root pruning" [DEBUG] agent.server.autopilot: autopilot is now stopped [INFO] agent.leader: stopping routine: routine="federation state pruning" [INFO] agent.leader: stopped routine: routine="federation state pruning" [INFO] agent.server.autopilot: reconciliation now disabled [INFO] agent.router.manager: shutting down [INFO] agent: consul server down [INFO] agent: shutdown complete [DEBUG] agent.http: Request finished: method=PUT url=/v1/agent/leave from=127.0.0.1:62886 latency=11.017448542s [INFO] agent: Stopping server: protocol=DNS address=127.0.0.1:8600 network=tcp [INFO] agent: Stopping server: protocol=DNS address=127.0.0.1:8600 network=udp [INFO] agent: Stopping server: address=127.0.0.1:8500 network=tcp protocol=http [INFO] agent: Waiting for endpoints to shut down [INFO] agent: Endpoints down [INFO] agent: Exit code: code=0
Consul automatically tries to reconnect to a failed node, assuming that it may be unavailable because of a network partition, and that it may be coming back.
Consul web GUI
-
When the Consul server is invoked:
open "http://localhost:8080/ui/${DATACENTER1_ID}/services"
The Consul GUI provides a mouse-clickable way for you to conviently work with these (explained below):
-
Services (in the Service Catalog)
-
Nodes is the number of Consul instances
-
Key/Value datastore of IP address generated
-
ACL (Access Control List) entries which block or allow network access based on port number
-
Intentions to allow or deny connections between specific services by name (instead of IP addresses) in the Service Graph
-
API
-
Custom programs (written in Go, etc.) can communication with Consul using HTTP API calls defined in:
-
To list nodes in JSON using API:
curl "http://localhost:8500/v1/catalog/nodes"
[ { "ID": "019063f6-9215-6f2c-c930-9e84600029da", "Node": "Judiths-MBP", "Address": "127.0.0.1", "Datacenter": "dc1", "TaggedAddresses": { "lan": "127.0.0.1", "wan": "127.0.0.1" }, "Meta": { "consul-network-segment": "" }, "CreateIndex": 9, "ModifyIndex": 10 } ]
TODO: DNS -consul specifies installation of HashiCorp Consul agent.
Prepared Queries
- https://www.consul.io/api-docs/query
BLAH: This feature is only available when using API calls (not CLI).
More complex DNS queries can be made using API calls than limiting entry points exposed by DNS.
To get a set of healthy nodes which provide a given service:
-
Edit a prepared query template file in this format:
{ "Template": { "Type": "name_prefix_match", "Regexp": "^geo-db-(.*?)-([^\\-]+?)$", "RemoveEmptyTags": false } }
-
Register a query template (named, for example “banking-app”) using in-line:
curl "${CONSUL_URL_WITH_PORT_VER}/query" \ --request POST \ --data @- << EOF { "Name": "banking-app", "Service": { "Service": "banking-app", "Tags": ["v1.2.3"], "Failover": { "Datacenters": ["dc2", "dc3"] } } } EOF
Alternately, instead of EOF, create a file:
CONSUL_QUERY_FILENAME="payload.json"
-
Make the request by providing a valid Token:
curl --request PUT \ --data "@${CONSUL_QUERY_FILENAME}" \ "${CONSUL_URL_WITH_PORT_VER}/query/${CONSUL_AGENT_TOKEN}"
Queries are also used for ACL
Query execution is subject to node/node_prefix and service/service_prefix policies.
Chaos Engineering
Practicing use of the above should be part of your pre-production Chaos Engineering/Incident Management process.
Failure modes:
-
Failure of single app node (Consul should notice and send alert)
- Failure of a Consul Non-Voting server (if setup for performance)
- Failure of a Consul Follower server (triggers replacement)
-
Failure of the Consul Leader server (triggering an election)
- Failure of an entire Consul cluster Availability Zone
- Failure of an entire Consul cluster Region
Degraded modes:
-
Under-performing app node
- Under-performing Consul Leader server
- Under-performing Consul Follower server
-
Under-performing Consul Non-voting server
- Under-performing transfer between Consul Availability Zones
- Under-performing WAN Gossip protocol transfer between Consul Regions
Down for maintenance
-
To bring a node offline, enable maintenace mode:
consul maint -enable -server redis -reason "Server patching"
This action is logged, which should trigger an alert to the SOC.
-
To bring a node back online, disable maintenace mode:
consul maint -disable -server redis
Backup Consul data to Snapshots
- https://www.consul.io/commands/snapshot
- https://www.consul.io/api-docs/snapshot
- Enterprise Academy: Backup and Restore
- BK on Udemy
Consul keeps its data in memory (rather than in a database on a hard drive).
So data in a Consul agent has to be captured in complete point-in-time snapshots (gzipped tar file) of Consul’s committed state. Other data also in the Snapshot include:
- Sessions
- Prepared queries
-
Specify the ACL Token (such as “12345678-1234-abcd-5678-1234567890ab”) (also used for UI login):
export CONSUL_HTTP_TOKEN="${CONSUL_ACL_TOKEN}"
-
PROTIP: Name files with a timestamp in UTC time zone, such as 2022-05-16T03:10:15.386UTC.tgz
brew install coreutils CONSUL_BACKUP_FILENAME="$( gdate -u +'%Y-%m-%dT%H:%M:%S.%3N%Z' ).tgz"
Snapshots are typically performed on the LEADER node, but when the Cluster has no Leader, a FOLLOWER can take it if the --stale flag is specified.
-
Create the snapshot manually using the CLI, API,
consul snapshot save "${CONSUL_BACKUP_FILENAME}"
curl --header "X-Consul-Token: "${CONSUL_ACL_TOKEN}" \ "${CONSUL_URL_WITH_PORT_VER}/snapshot -o ${CONSUL_BACKUP_FILENAME}"
-
View snapshots available on the local filesystem:
consul snapshot inspect
-
PROTIP: It’s more secure to transfer snapshots offsite, held under an account separate from day-to-day operations.
- Amazon S3
- Azure Blob Storage
- Google Cloud Storage
For example, define an S3 bucket. PROTIP: Use different cloud service account to write and another to receive snapshots.
Enterprise Snapshot Agent
Enterprise-licensed users can run the Consul Snapshot Agent Service to automatically collect agents periodically.
-
Ensure that an enterprise license is configured.
-
Define the configuration file, such as this sample consul-snapshot.d file to take a snapshot every 30 minutes:
{ "snapshot_agent": { "http_addr": "127.0.0.1:8500", "token": "12345678-1234-abcd-5678-1234567890ab", "datacenter": "dc1", "snapshot": { "interval": "30m", "retain": 336, "deregister_after": "8h" }, "aws_storage": { "s3_region": "us-east-1", "s3_bucket": "my-consul-snapshots-bucket" } } }
In PRODUCTION, ACLs are enabled, so token need to be generated and included in the file.
336 snapshots are retained, with the oldest automatically discarded.
De-register the service if it’s dead over 8 hours.
-
Run:
consul snapshot agent -config-dir=/etc/consul-snapshot.d
Registration is done automatically.
https://www.consul.io/commands/snapshot/agent
Service file
A systemd agent configuration file in Linux, such as:
/etc/systemd/system/snapshot.service
[unit] Description="HashiCorp Consul Snapshot Agent" Documentation=https://www.consul.io/ Requires=network-online.target After=consul.service ConditionFileNotEmpty=/etc/snapshot.d/shapshot.json [Service] User=consul Group=consul ExecStart=/usr/local/bin/consul snapshot agent -config-dir=/etc/snapshot.d/ KillMode=process Restart=on-failure LimitNOFILE=65535 [Install] WantedBy=multi-user.target
- https://unix.stackexchange.com/questions/506347/why-do-most-systemd-examples-contain-wantedby-multi-user-target
Restore from Snapshot
Snapshots are intended for full Disaster Recovery, not for selective restore back to a specific point in the past (like GitHub can do).
- https://unix.stackexchange.com/questions/506347/why-do-most-systemd-examples-contain-wantedby-multi-user-target
-
To restore to a fresh set of Consul servers.
consul snapshot restore
CAUTION: A Consul server stops processing while performing a restore. You don’t want it working anyway.
Alternately, using API:
curl --header "X-Consul-Token: "${CONSUL_ACL_TOKEN}" \ --request PUT \ --data-binary "@${CONSUL_BACKUP_FILENAME}" \ "${CONSUL_URL_WITH_PORT_VER}/snapshot
PROTIP: There is no selective restore of data.
-
After each configuration change, make a backup copy of the file seed (version) file to establish quorum, at:
raft/peers.json
That file contains information needed for manual Recovery:
[ { "id": "12345678-1234-abcd-5678-1234567890ab", "address": "10.1.0.1:8300", "non-voter": false } ... ]
See https://learning.oreilly.com/library/view/consul-up-and/9781098106133/ch02.html#building-consensus-raft
PROTIP: As per CAP Theorem, Raft emphasizes Consistency (every read receives the most recent write value) over Availability.
Service Graph Intentions
The Consul GUI enables search for connections by name (instead of IP addresses) as well as specifying connections between specific services by name (instead of IP addresses):
PROTIP: Working with service names using a GUI not only reduces hassle but also minimizes mistakes, which have dire Security consequences.
-
On the CLI, Deny the web server from talking to anything:
consul intention create -deny web '*'
-
On the CLI, Allow the web server to talk to db (the database):
consul intention create -allow web db
Rules are set on the service itself, not on where they are implemented.
Services
- https://www.consul.io/docs/discovery/services
Consul discovers services which are setup to be discovered with a file on the service machine.
-
Edit the file:
{ "service": { "id": "unique-server-01", "name": "retail-web-1234567890", "token": "12345678-1234-abcd-5678-1234567890ab", "tags": ["v1.02","production"], "address": "10.1.2.2", "port": 80, "checks": [ { "args": ["/usr/local/bin/check_mem.py"], "interval": "30s" } ], } }
A check is needed for memory (“mem”) because it’s internal to the app’s process.
https://www.consul.io/docs/discovery/checks
-
Construct the file CONSUL_SVC_REGIS_FILE such as /etc/consul.d/redis.json (or hcl):
{ "service": { "name": "retail-web", "token": "12345678-1234-abcd-5678-1234567890ab", "port": 80, "check": { "id": "http", "name": "web check", "tcp": "localhost:80", "interval": "5s", "timeout": "3s } } }
-
A service instance is defined by a service name + service ID.
QUESTION: “web check”?
-
PROTIP: Provide Consul read permissions on the directory/file used above as a variable so the same CLI can be used in dev & prod (for less mistakes):
CONSUL_SVC_REGIS_FILE="redis.hcl"
-
Define the Consul Registration Service:
CONSUL_SVC_REGIS_FRONT="http://localhost:8500"
Alternately, in production (for example):
CONSUL_SVC_REGIS_FRONT="https://consul.example.com:8500}"
-
Register the service:
consul services register redis.hcl
Alternately, make an API call specifying -config-file name:
curl -X PUT --data "@${CONSUL_SVC_REGIS_FILE}" \ "${CONSUL_SVC_REGIS_FRONT}/v1/agent/service/register
-
Consul does not watch that file after loading, so changes to it after load must be reloaded using:
sysctl consul reload
-
“Service discovery” finds available service instance addresses and ports.
-
TODO: Define default connection limits, for better security.
-
Consul API Gateway =
- https://www.youtube.com/watch?v=JtVDliGL3mE Video for Consul API Gateway with Jeff Apple, PM of API Gateway
- https://www.hashicorp.com/blog/announcing-hashicorp-consul-api-gateway
- https://learn.hashicorp.com/tutorials/consul/kubernetes-api-gateway?in=consul/developer-mesh
- https://www.hashicorp.com/blog/consul-api-gateway-now-generally-available Feb 24 2022
-
QUESTION: Linux Security Model integrated into operating system, such as AppArmor, SELinux, Seccomp.
See https://www.consul.io/docs/security/security-models/core
-
Consul load balances across instances.
-
Define memory variable:
CONSUL_CONFIG_KIND="extra-config"
-
Define a CONSUL_CONFIG_FILE
config_entries { bootstrap { kind = "proxy-defaults" name = "global" config { local_connect_timeout_ms = 1000 handshake_timeout_ms = 1000 } } } bootstrap { kind = "service-defaults" name = "web" namespace = "default" protocol = "http" }
-
consul config write “${CONSUL_CONFIG_FILE}”
-
Read back
consul config read -kind proxy-defaults -name web
-
“Discover” nodes using DNS interface dig command to the Consul agent’s DNS server, which runs on port 8600 by default:
REMEMBER: Only healthy instances are returned.
If running within Docker image “hashicorp/counting-service:0.0.2”
dig @127.0.0.1 -p 8600 "counting.service.consul"
Alternately, discover apps using dig appb.service.consul
If running locally:
dig @127.0.0.1 -p 8600 "$(hostname).node.consul"
; <<>> DiG 9.10.6 <<>> @127.0.0.1 -p 8600 wilsonmar-N2NYQJN46F.node.consul ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16775 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 2 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;wilsonmar-N2NYQJN46F.node.consul. IN A ;; ANSWER SECTION: wilsonmar-N2NYQJN46F.node.consul. 0 IN A 127.0.0.1 ;; ADDITIONAL SECTION: wilsonmar-N2NYQJN46F.node.consul. 0 IN TXT "consul-network-segment=" ;; Query time: 2 msec ;; SERVER: 127.0.0.1#8600(127.0.0.1) ;; WHEN: Sun May 08 22:35:21 MDT 2022 ;; MSG SIZE rcvd: 113
QUESTION: SRV lookups
-
Connect
NOTE: Unhealthy nodes are filtered out.
TODO: This approach enables automatic load balancing. Decentralizes DNS.
(Consul) Nodes (Health Checks)
Red x’s identify Consul nodes which failed health checks.
Moreover, Consul servers Gossip with each other about state changes.
Consul can use several techniques to obtain health info: Docker, gRPC, TCP, TTL heartbeats, and Nagios-compatible scripts.
-
To perform a health check manually using an API call:
curl http://127.0.0.1:8500/v1/health/checks/my-service
Parse the JSON response:
[ { "Node": "foobar", "CheckID": "service:redis", "Name": "Service 'redis' check", "Status": "passing", "Notes": "", "Output": "", "ServiceID": "redis", "ServiceName": "redis", "ServiceTags": ["primary"] } ]
Consul External Services Monitor (ESM)
- https://github.com/hashicorp/consul-esm
- https://learn.hashicorp.com/tutorials/consul/service-registration-external-services
When a local Consul agent cannot be installed locally, such as in cloud-managed services or incompatible hardware, to keep Consul’s service catalog up to date, periodically poll those services by installing the Consul ESM on ___. Such a health check is added to service registration like this:
token "12345678-1234-abcd-5678-1234567890ab", check { id = "some-check" http = "http://localhost:9002/health", method = "GET", interval = "1s", timeout = "1s" }
ACL (Access Control List) Operations
- https://www.udemy.com/course/hashicorp-consul/learn/lecture/24724816#questions/17665170/
ACLs define access granted through specific ports through firewalls (on Enterprise network traffic in “L3” segments).
ACLs are used to:
- Add & Remove nodes to the datacenter
- Add & Remove services
- Discover services
- Consul KV (CRUD) transactions
- API/CLI operations to interact with the datacenter
- Block Catalog Access
Vault works the same way as this: An ACL Token encapsulates multiple policies, with each policy aggregating one or more rules.
SECURITY PROTIP: To reduce the “blast radius”, create a rules.hcl file for each node. For each node, specifically name the node within each node’s rules.hcl file.
TODO: Use a templating utility to create a rules.hcl file containing a different node name for each node.
-
Environment Variable names I use in scripts involving ACL:
ACL_POLICY_FILE_NAME=”some-service-policy.hcl”
ACL_POLICY_NAME=”some-service-policy“
ACL_POLICY_DESC=”Token” -
Create the file defined in ACL_POLICY_FILE_NAME:
# Policy A service "web" { policy = "read" } key-prefix "foo-path/" { policy = "write" }
# Policy B service "db" { policy = "deny" } node "" { policy = "read" }
Policy dispositions in rules include “read”, “write”, “read”, “list”.
TODO: To define according to “Least Privilege” principles, provide “remove” permissions to a separate account than the account which performs “add”.
-
Initiate the policy using the policy file:
consul acl policy create -name "${ACL_POLICY_NAME}" \ -rules @"${ACL_POLICY_FILE_NAME}"
-
Create the Token GUID from the policy created:
ACL_TOKEN=$( consul acl token create -description "${ACL_POLICY_DESC}" \ -policy-name @"${ACL_POLICY_NAME}" )
-
Add ACL_TOKEN value
service { name = "dashboard", port = 9002, token = "12345678-1234-abcd-5678-1234567890ab", }
D. In a single datacenter (with Kubernetes)
In HashiCorp’s YouTube channel covering all their 8 products:
Rosemary Wang (joatmon08.github.io, Developer Advocate) with J. Cole Morrison hold fun hashicorplive Twitch parties [about two hours each] to show how to learn Consul “the hard way” by setting it up from scratch, using code from github.com/jcolemorrison/getting-into-consul
Consul offers three types of Gateways in the data path to validate authenticity and traffic flows to enforce intentions between services: Enterprise Academy:
- Service Mesh Gateway</a>
- Enterprise Academy: Ingress Gateways
-
Terminating Gateways</a>
- DOC: Transit gateway
( https://play.instruqt.com/hashicorp/tracks/vault-advanced-data-protection-with-transform)
- Enterprise Academy: Deploy Consul Ingress Gateways (Deploy an Ingress Gateway for Inbound Mesh Connectivity)
Kubernetes with Consul
- Enterprise Academy: Running Consul on Kubernetes (Learn how to install Consul on Kubernetes)
Kubernetes with Service Mesh and Consul
- VIDEO: “How Consul and Kubernetes work together”
- https://www.consul.io/docs/connect
- https://www.udemy.com/course/hashicorp-consul/learn/lecture/24649092#questions
- VIDEO: “Zero Trust Security for Legacy Apps with Service Mesh”
- VIDEO: “Consul Service Mesh: Deep Dive”
This Consul Enterprise feature is called the “Consul Connect”. VIDEO
Envoy install
To ensure a specific version tested with the tutorial, instead of using brew install func-e envoy:
-
Install Envoy proxy (specifically version 1.20.1) using https://func-e.io/:
curl https://func-e.io/install.sh | bash -s -- -b /usr/local/bin
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 9791 100 9791 0 0 17341 0 --:--:-- --:--:-- --:--:-- 17421 tetratelabs/func-e info checking GitHub for latest tag tetratelabs/func-e info found version: 1.1.3 for v1.1.3/darwin/arm64 tetratelabs/func-e info installed /usr/local/bin/func-e
If using ARM:
export FUNC_E_PLATFORM=darwin/amd64 func-e use 1.20.1
downloading https://archive.tetratelabs.io/envoy/download/v1.20.1/envoy-v1.20.1-darwin-amd64.tar.xz
-
Move Envoy from the .func-e folder to a path common in $PATH:
sudo mv ~/.func-e/versions/1.20.1/bin/envoy /usr/local/bin/
-
Verify if can be found in PATH:
envoy --version
envoy version: ea23f47b27464794980c05ab290a3b73d801405e/1.20.1/Modified/RELEASE/BoringSSL
NOTE: brew install envoy installs version 1.22.2 (at time of writing).
Recordings
A series of recordings live on Twitch.tv by Developer Evangelists Rosemary Wang and J. Cole Morrison:
-
Part 3: Scaling, Outage Recovery, and Metrics for Consul on AWS
-
Part 4: Security, Traffic Encryption, and ACLs
- secure Gossip communication between Consul agents, encrypt RPC calls between client and server with TLS, and begin setting up ACLs.
- Generate 32-bit encryption key. Apply key to agents. Rotate keys.
-
Part 9: Service Mesh Proxy Metrics
- Install/config. prometheus.io static & dynamic scrape, exposing Envoy
- Part 10: Terminating & Ingress Gateways
- https://play.instruqt.com/HashiCorp-EA/tracks/consul-ingress-gateways-deployment
-
Coming soon after re-edits.
-
Part 13: Consul-Terraform-Sync
- https://consul.io/docs/nia/configuration
Sidecar proxy injection
Consul comes with a Sidecar proxy, but also supports the Kubernetes Envoy proxy (from Lyft). (QUESTION: This means that migration to Consul can occur gradually?)
You can use Helm but consul-k8s CLI is now the recommended way because it validates your environment and gives you much better error messages and helps with a clean installation
-
To register (inject) Consul as a Sidecar proxy, add this annotation in a Helm chart:
apiVersion: v1 kind: Pod metadata: name: cats annotations: "consul.hashicorp.com/connect-inject": "true" spec: containers: - name: cats image: grove-mountain/cats:1.0.1 ports: - containerPort: 8000 name: http
-
Yaml file:
- helm-consul-values.yaml changes the default settings to give a name to the datacenter, specify the number of replicas, and enable Injection
- consul-helm
- counting.yaml
- dashboard.yaml
-
As instructed, install Helm:
brew install helm
-
Ensure you have access to the Consul Helm chart and you see the latest chart version listed. If you have previously added the HashiCorp Helm repository, run helm repo update.
helm repo add hashicorp https://helm.releases.hashicorp.com
helm search repo hashicorp/consul
NAME CHART VERSION APP VERSION DESCRIPTION hashicorp/consul 0.35.0 1.10.3 Official HashiCorp Consul Chart
-
Install Consul with the default configuration which if not already present, creates a Consul Kubernetes namespace and install Consul on the dedicated namespace:
helm install consul hashicorp/consul --set global.name=consul --create-namespace -n consul
NAME: consul
Alternately:
helm install consul -f helm-consul-values.yaml ./consul-helm
-
On a new Terminal window:
kubectl port-forward svc/consul-tonsul-ui 8080:80
Forwarding from 127.0.0.1:8080 -> 8500 Forwarding from [::1]:8080 -> 8500
-
Register with Consul agent (which doesn’t start the Sidecar proxy):
{ "service": { "name": "front-end-sidecar", "port": "8080", "connect": { "sidecar_service": {} } } }
-
Registering a Service Proxy:
{ "service": { "id": "someweb-01", "name": "front-end-sidecar", "tags": ["v1.02","production"], "address": "", "port": 80, "checks": [ { "sidecar_service": { "proxy": { "upstreams": [{ "destination_name": "db01" } ] } } } }
CAUTION: Even though it’s a “name”, its value is used to match to register the service.
https://www.udemy.com/course/hashicorp-consul/learn/lecture/24649144#questions
-
Start the Sidecar proxy process.
???
-
View the Consul dashboard:
http://localhost:8080/ul/datacenter/services
References about Kubernetes with Consul:
- https://github.com/hashicorp/consul-k8s
- https://learn.hashicorp.com/tutorials/consul/kubernetes-reference-architecture?in=consul/kubernetes-production
- VIDEO: Introduction to HashiCorp Consul
-
VIDEO: “What is the Crawl, Walk, Run Journey of Adopting Consul”
-
VIDEO: “HashiCorp Consul Introduction: What is a Service Mesh?” by (former) Developer Advocate Nicole Hubbard showing use of Shipyard and K3s.
- VIDEO: How does Consul work with Kubernetes and other workloads?
- https://platform9.com/blog/understanding-kubernetes-loadbalancer-vs-nodeport-vs-ingress/
- https://learn.hashicorp.com/tutorials/terraform/multicloud-kubernetes?in=consul/kubernetes
Service Discovery Registry DNS Queries
LEARN: In enviornment where Infosec limit DNS traffic to the default UDP port 53, we setup dnsmasq or BIND forwarding from port 53 to 8600 because we don’t want to use root privileges requiredd to use ports below 1024.
Consul servers maintain a DNS “Services Registry”
-
Each service (such as Redis cache in this example) is registered:
service { name = "web", port = 9090, token = "12345678-1234-abcd-5678-1234567890ab", connect:{ sidecar_service { port = 20000 proxy { upstreams { destination_name = "payments" local_bind_address = "127.0.0.1" local_bind_port = 9091 } } } } }
- Proxy Defaults to control proxy configuration
- Service Defaults configures defaults for all instances of a service
Discovery: Service Router -> Service Spliter -> Service Router
- Service Router defines where to send Layer 7 traffic
- Service Splitter defines how to divide traffic for a single HTTP route
- Service Resolve matches service instances with Consul upstreams
PROTIP: Include a health check stanza in the service registration, such as:
service { ... "check": { "id": "mem-util", "name": "Memory utilitization", "script": "/usr/local/bin/check_mem.py", "interval": "10s" } }
Once registered, a service should appear as available within the Consul service registry.
Centralized ???
Consul External Services Monitor (ESM)
When a local Consul agent cannot be installed locally, such as in cloud-managed services or incompatible hardware, to keep Consul’s service catalog up to date, periodically poll those services by installing the Consul ESM on ___.
Such a health check added to service registration:
token "12345678-1234-abcd-5678-1234567890ab", check { id = "" }
-
Discover DNS SRV record
- https://www.wikiwand.com/en/SRV_record
curl \ http://localhost:8500/v1/catalog/services/redis
PROTIP: Consul cleints return only healthy nodes and services because it maintains the health status.
- https://www.wikiwand.com/en/SRV_record
-
Each local Consul caches lookups for 3 days.
Each entry can be tagged, such as
tag.service.service.datacenter.domain
tag.service.service.datacenter.${DNS_TLD}
db.redis.service.dc1.consul
PROTIP: Consul is the #1 discovery tool with AWS Route53 (via delegation from resolver)
Traditional DNS services ( bind, iptables, dnsmasq ) can be configured to forward requests with the DNS_TLD suffix (“consul”):
-
NOTE: Consul can also received forwarded DNS requests from in below:
server=/consul/127.0.0.1#8600
-
To configure bind server
zone "consul" IN{ type forward forward only forwarders { 127.0.0.1 port 8600 } }
-
To configure iptables in Linux servers:
iptables -t nat -A PREROUTING -p tcp -m tcp --dport iptables -t nat -A PREROUTING -p udp -m upd --dport iptables -t nat -A OUTPUT -d localhost -d tcp -m iptables -t nat -A OUTPUT -d localhost -d upd -m
The response is 53 -j REDIRECT --to ports 8600
References about templating/generating JSON & YAML:
- https://learnk8s.io/templating-yaml-with-code
- Jsonnet
- https://golangexample.com/a-tool-to-apply-variables-from-cli-env-json-toml-yaml-files-to-templates/
- https://github.com/krakozaure/tmpl?ref=golangexample.com
- https://wryun.github.io/rjsone/
Consul workflows beyond Kubernetes
-
Service Discovery: (kube-dns, kube-proxy) to identify and connect any service on any cloud or runtime. with Consul DNS
-
Service Configuration: (K8s Configmaps) but Consul also updates F5 and other load balancer rules, for dynamic configuration across distributed services (in milliseconds)
-
Segmentation: (Network Policy + Controller), providing network infrastructure automation
Service Discovery With Consul on Kubernetes
Service Mesh
Multi-service Service Mesh: secure service-to-service traffic with Mutual TLS certificates, plus enable progressive application delivery practices.
- Application networking and security with identity-based authorization
- L7 traffic management
- Service-to-service encryption
- Health checking to automatically remove services that fail health checks
Consul Enterprise Academy: Service Mesh
Deploying a Service Mesh at Enterprise Scale With Consul - HashiConf Global 2021
Beyond:
- Access Control
- Billing
- Networking
- Identity
- Resource Management
Mutual TLS
- https://www.consul.io/docs/security/encryption#rpc-encryption-with-tls
- https://www.udemy.com/course/hashicorp-consul/learn/lecture/24723260#questions
To encrypt traffic between nodes, each asset is given an encrypted identity in the form of a TLS certificate (in X.509, SPIFFE-compatible format). Consul also provides a Proxy to enforce communications between nodes using “Mutual TLS” where each party exchange certificates with each other.
Consul’s auto-join provider enables nodes running outside of Kubernetes to join a Consul cluster running on Kubernetes API.
Consul can auto-inject certifictes into Kubernetes Envoy Sidecars to secure communication traffic (within the Service Mesh).
RECOMMENDED: Have Consul use HashiCorp Vault to generate dynamic x.509 certificates.
Consul Connect (Service Mesh)
- VIDEO: “Introduction to HashiCorp Consul Connect”
- <a target=”_blank” href=” * Instruqt: Getting started with Consul Connect
- A10 & HashiCorp Network Infrastructure Automation with Consul-Terraform-Sync
- Observability with HashiCorp Consul Connect (Service Mesh)
- “Combining DevOps with PKI Compliance Using HashiCorp Vault & Consul”
Integration between Consul and Kubernetes is achieved by running Consul Service Mesh (aka Consul Connect) on Kubernetes:
Catalog Sync: Sync Consul services into first-class Kubernetes services and vice versa. This enables Kubernetes to easily access external services and for non-Kubernetes nodes to easily discover and access Kubernetes services.
-
Have Vault act as the Certificate Authority (CA) for Consul Connect. On an already configured Vault, enable:
vault secrets enable pki vault secrets enable consul
-
A sample Consul configuration to use Vault for Connect:
connect { enabled = true ca_provider = "vault" ca_config { address = "https://vault.example.com:8200" token = "s.1234567890abcdef12" root_pki_path = "connect_root" intermediate_pki_path = "connect_inter" leaf_cert_ttl = "24h" rotation_period = "2160h" intermediate_cert_ttl = "8760h" private_key_type = "rsa" private_key_bits = 2048 } }
-
Configure access to Consul to create tokens (using the admin token):
vault write consul/config/access \ address=https://consul:8200 \ token=12345678-1234-abcd-5678-1234567890ab
-
Create a role for each permission set:
vault write consul/roles/my-role policies=readonly
-
Generate credentials (lease-id, lease_duration 768h, lease_renewable true, token):
vault read consul/creds/my-role
-
For each access, human users generate a new ACL token from Vault.
Assist or Replaces Kubernetes
- https://learn.hashicorp.com/tutorials/nomad/consul-service-mesh
^ https://www.consul.io/docs/k8s/installation/install
Consul combines with Nomad, Vault, and Terraform to provide a full alternative to Kubernetes for Docker container orchestration:
Nomad, by itself, is a cluster manager and task scheduler.
Nomad, like Kubernetes, orchestrates Docker containers. But Nomad also orchestrates non-containerized apps. Nomad demonstrated its scalability in the Nomad’s “C2M Challenge”, which shows it versatile and lightweight to support over 2,000,000 tasks.
The smallest units of deployment in Nomad are called “Tasks” – the equivalent to “Pods” in Kubernetes.
Kubernetes (as of publishing date) claims to support clusters up to 5,000 nodes, with 300,000 total containers, and no more than 150,000 pods.
Nomad, originally launched in 2015, as part of Cloudflare’s development environment [transcript] – a company which routes 10% of the world’s internet traffic) and a cornerstone of Roblox’s and Pandora’s scaling.
Nomad may not be as commonly used as Kubernetes, but it already has a tremendous influence.
D. In a single datacenter using Kubernetes
-
The repo for using Consul on Kubernetes is at
https://github.com/hashicorp/consul-k8s
-
Get the official Helm chart:
git clone https://github.com/hashicorp/consul-k8s/tree/main/charts/consul
(previously https://github.com/hashicorp/consul-helm.git)
-
Customize file values.yaml such as:
global: enabled: true image: "consul:1.5.1" imagek8: "hashicorp/consul-k8s:0.8.1" domain: consul datacenter: primarydc server: enabled: true replicas: 3 bootstrapExpect: 3
See https://www.consul.io/docs/k8s/helm
-
Identify the latest release for image: “consul at:
https://github.com/hashicorp/consul/releases
which was v1.12.0 on April 20, 2022.
-
STAR: Identify the latest release of imagek8: “hashicorp/consul-k8s: at:
https://github.com/hashicorp/consul-k8s/releases
which, at time of writing, was v0.44.0 (May 17, 2022).
This is reflected at: https://artifacthub.io/packages/helm/hashicorp/consul
See https://www.consul.io/docs/k8s/installation/install
-
Deploy using Helm:
helm install consul.helm -f values.yaml
E. In a single 6-node datacenter (survive loss of an Availability Zone)
HA (High Availability)
In order for a datacenter to withstand the sudden loss of a server within a single Availability Center or the loss of an entire Availability Zone, setup 6 servers for best resilience plus performance under load:
The yellow star in the diagram above marks the LEADER Consul server. The leader is responsible for ingesting new log entries of cluster changes, writing that to durable storage, and replicating to followers.
PROTIP: Only the LEADER processes requests. FOLLOWERs do not respond to request as their job is just to receive replication data (enjoy the food and stand by like a Prince). This architecture is similar to Vault’s.
IMPORTANT: For better scalability, use Consul’s Enterprise “Autopilot” mechanism to setup “NON-VOTER” Consul server nodes to handle additional processing for higher performance under load. See https://play.instruqt.com/HashiCorp-EA/tracks/consul-autopilot
The NON-VOTER is in Zone 2 because leadership may switch to different FOLLOWER servers over time.
So keep the above in mind when using this page to describe the Small and Large server type in each cloud.
PROTIP: The recommended maximum number of Consul client nodes for a single datacenter is 5,000.
CAUTION: A Consul cluster cannot operate in a single Availability Zone.
Actually, HashiCorp’s Consul Enterprise Reference Architecture for a single cluster is 5 Consul server nodes across 3 availability zones.
Within an Availability Zone, if a voting FOLLOWER becomes unavailable, a non-voting member in the same Availability Zone is promoted to a voting member:
Raft concensus algorithm
Consider these dynamic illustrations about how the Raft mechanism works:
- http://thesecretlivesofdata.com/raft/ provides a visualization
- https://raft.github.io/
To ensure data consistency among nodes across Availability Zones, the Raft consensus algorithm (a simpler implementation of Paxos) maintains consistent state storage for updating catalog, session, prepared query, ACL, and KV state.
Each transaction is considered “comitted” when more than half the followers register it.
If the LEADER server fails, an election is automatically held among a quorum (adequate number of) FOLLOWERs to elect a new LEADER from among candidates.
Serf LAN & WAN Gossip
- https://learn.hashicorp.com/tutorials/consul/federation-gossip-wan
- https://www.consul.io/docs/intro/vs/serf
To ensure that data is distributed with reliable communication not assumed, Consul uses the Gossip protocol powered by the multi-platform Serf library open-sourced by HashiCorp at https://github.com/hashicorp/serf (writte in Golang). The Gossip protocol is also used by the Apache Serf library, which is a modified version of the SWIM (Scalable Weakly-consistent Infection-style Process Group Membership) protocol.
Serf provides for:
-
Events broadcasting to perform cross-datacenter requests based on Membership information
-
Failure detection to gracefully handle loss of connectivity
No Vault - Hard Way
If Vault is not used, do it the hard way:
-
Generate Gossip encryption key (a 32-byte AES GCM symmetric key that’s base64-encoded).
-
Arrange for regular key rotation (using the Keyring built in Consul)
-
Install encryption key on each agent.
-
Review Gossip Telemetry output.
NOTE: To manage membership and broadcast messages to the cluster,
Refer to the Serf documentation
F. For HA on multiple datacenters federated over WAN
REMEMBER: Like Vault, Consul Datacenter federation is not a solution for data replication. There is no built-in replication between datacenters. consul-replicate is what replicates KV between datacenters.
- Enterprise Academy: Federate Multiple Datacenters (Securly connect multiple Consul datacenters with ACL replication)
- https://github.com/hashicorp/consul-k8s-wan-fed-vault-backend
The Enterprise edition of Consul enables communication across datacenters using Federate Multiple Datacenters coordinated using WAN Gossip protocol.
- https://learn.hashicorp.com/tutorials/consul/federation-gossip-wan?in=consul/networking
Setup Network Areas
Create compatible areas in each datacenter:
-
Define DATACENTER IDs
DATACENTER1_ID="dc1" DATACENTER2_ID="dc2"
-
Repeat for each DATACENTER ID value:
consul operator area create \ -peer-datacenter="${DATACENTER1_ID}"
consul operator area create \ -peer-datacenter="${DATACENTER2_ID}"
-
Run for the first datacenter with its DATACENTER_IP value:
consul operator area join \ -peer-datacenter="${DATACENTER1_ID}" "${DATACENTER_IP}"
This establishes the handshake.
consul-replicate
-
To perform cross-data-center Consul K/V replication, install a specific tag of the consul-replicate daemon to run continuosly:
https://github.com/hashicorp/consul-replicate/tags
The daemon consul-replicate integrates with Consul to manage application configuration from a central data center, with low-latency asynchronous replication to other data centers, thus avoiding the need for smart clients that would need to write to all data centers and queue writes to handle network failures.
QUESTION: No changes since 2017, so doesn’t work with TLS1.3, arm64, new Docker versions. Developer Seth Vargo is now at Google.
https://learn.hashicorp.com/tutorials/consul/federation-gossip-wan?in=consul/networking
Replicate ACL entries
Cache ACLs for them to “ride out partitions”.
-
Configure primary datacenter servers and clients
{ "datacenter": "dc1" "primary_datacenter": "dc1" "acl": { "enabled": true, "default_policy": "deny", "enable_token_persistence": true } }
-
Create ACL policy
acl = "write" operator = "write" service_prefix "" { policy = "read" intentions = "read" }
REMEMBER: Intentions follow a top-down ruleset using Allow or Deny intentions. More specific rules are evaluated first.
-
Create ACL replication token
create acl token create \ -description "ACL replication token" \ -policy-name acl-replication
Sample response:
AccessorID: SecretID: Description: Local: false Create Time: Policies:
-
Configure secondary datacenter agents (servers and clients):
{ "datacenter": "dc2" "primary_datacenter": "dc1" "acl": { "enabled": true, "default_policy": "deny", "enable_token_persistence": true, "enable_token_replication": true } }
-
Apply replication token to servers in secondary datacenter:
Enterprise configuration
From v1.10.0 on, a full license file must be defined in the server config file before installation:
log_level = "INFO" server = true ui = true datacenter = "us-east-1" license_path = "/opt/consul/consul.hclic" client_addr = "0.0.0.0" bind_addr = "10.1.4.11" advertise_addr = "10.1.4.11" advertise_addr_wan = "10.1.4.11"
Within CLI:
license_path = "/etc/consul.d/consul.hclic"
advertise_addr
are reacheable outside the datacenter.
Agent configurations have a different IP address and these settings to auto-join based on cloud (AWS) tags:
data_dir = "/opt/consul/data" bootstrap_expect = 5 retry_join = ["provider=aws region=us-east-1 tag_key=consul tag_value=true"] retry_join_wan = ["10.1.2.3","10.1.2.4"] connect = { enabled = true } performance = { raft_multiplier = 1 }
license_path - PROTIP: some use “.txt” or “.hcl” instead of “.hclic” to avoid the need to change text editor preferences based on file extension.
retry_join specifies the cloud provider and other metadata for auto-discovery by other Consul agents.
retry_join_wan specifies the IP address of each datacenter ingress.
WAN encryption has its own encryption key.
connect refers to Consul Connect (disabled by default for security).
raft_multiplier = 1 overrides for high-performance production usage the default value 5 for dev usage. This setting multiplies the time between failed leader detection and new leader election. Higher numbers extends the time (slower) to reduce leadership churn and associated unavailability.
TLS configuration
Consul has root and intermediate CA capability built-in to create certificates.
Vault can also be used.
A CA is named “server.datacenter.domain”.
-
Generate TLS .pem files.
-
Add “verify_” TLS encryption settings to the Consul Agent config file:
... verify_incoming = true verify_outgoing = true verify_server_hostname = true ca_file = "consul-agent-ca.pem" cert_file = "dc1-server-consul-0.pem" key_file = "dc1-server-consul-0-key.pem" encrypt = "xxxxxxxx"
Enterprise Autopilot CLI Commands
- Enterprise Academy: Autopilot Upgrades (Automate Upgrades with Consul Enterprise)
- Enterprise Academy: Federate Multiple Datacenters (Securly connect multiple Consul datacenters with ACL replication)
For write redundancy through automatic replication across several zones, add a tag “az” for “availability zone” to invoke the Enterprise feature “Consul Autopilot”:
autopilot = { redundancy_zone_tag = "az" min_quorum = 5 } node_meta = { az = "Zone1" }
The Enterprise Autopilot feature performs automatic, operator-friendly management of Consul servers, including cleanup of dead servers, monitoring the state of the Raft cluster, automated upgrades, and stable server introduction.
Autopilot enables Enterprise Redundancy Zones to improve resiliency and scaling of a Consul cluster. It can add “non-voting” servers which will be promoted to voting status in case of voting server failure. Unless during failure, Redundant zones do not participate in quorum, including leader election.
-
To get Autopilot configuration settings:
consul operator autopilot get-config
Sample response:
CleanupDeadServers = true LastContactThreshold = 200ms MaxTrailingLogs = 250 MinQuorum = 0 ServerStabilizationTime = 10s RedundancyZoneTag = "" DisableUpgradeMigration = false UpgradeVersionTag = ""
Alternately, make an API call for JSON response:
curl http://127.0.0.1:8500/v1/operator/autopilot/configuration
{ "CleanupDeadServers": true, "LastContactThreshold": "200ms", "MaxTrailingLogs": 250, "MinQuorum": 0, "ServerStabilizationTime": "10s", "RedundancyZoneTag": "", "DisableUpgradeMigration": false, "UpgradeVersionTag": "", "CreateIndex": 5, "ModifyIndex": 5 }
- Start a Consul server
-
See which Consul servers joined:
consul operator raft list-peers
Node ID Address State Voter RaftProtocol consul-server-1 12345678-1234-abcd-5678-1234567890ab 10.132.1.194:8300 leader true 3
After a quorum of servers is started (third new server), autopilot detects an equal number of old nodes vs. new nodes and promotes new servers as voters. This triggers a new leader election, and demotes the old nodes as non-voting members.
Mesh Gateway
When performing cross-cloud service communication:
services avoid exposing themselves on public networks by using Mesh Gateways (built upon Envoy) which sit on the public internet to accept L4 traffice with mTLS. Mess Gateways perform NAT (Network Address Translation) to route traffic to endpoints in the private network.
Consul provides an easy SPOC (Single Point of Contact) to specify rules for communication instead of requesting Neworking to manually configure a rule in the firewall.
-
Generate GATEWAY_TOKEN value
-
Start the Mesh Gateway:
consul connect envoy \ -gateway mesh -register \ -service "mesh-gateway" \ -address "${MESH_PRIVATE_ADDRESS}" \ -wan-address "${MESH_WAN_ADDRESS}" \ -admin-bind 127.0.0.1:0 \ -token="${GATEWAY_TOKEN}"
-
Configure one Consul client with access to each datacenter WAN link:
-
Envoy
-
Enable gRPC
Telemetry and capacity tests
Adequate reserve capacity for each component are necessary to absorb sudden increases in activity.
Alerts are necessary to request manual or automated intervention.
Those alerts are based on metrics for each component described at https://www.consul.io/docs/agent/telemetry
Artificial loads need to be applied to ensure that alerts and interventions will actually occur when appropriate. Load testing exposes the correlation of metric values at various levels of load. All this is part of a robust Chaos Engineering needed for pre-production.
-
At scale, customers need to optimize for stability at the Gossip layer.*
Manage from another Terminal
-
At the Terminal within a Consul agent instance,
create another Terminal shell instance to interact with the Consul agent runningconsul members
A sample successful response:
Node Address Status Type Build Protocol DC Partition Segment Judiths-MBP 127.0.0.1:8301 alive server 1.12.0 2 dc1 default <all>
PROTIP: The above command is only needed once to join a cluster. After that, agents Gossip with each other to propagate membership information with each other.
This error response reflects that CLI commands are a wrapper for API calls:
Error retrieving members: Get "http://127.0.0.1:8500/v1/agent/members?segment=_all": dial tcp 127.0.0.1:8500: connect: connection refused
BTW, to join a WAN, it’s
consul members -wan
-
For more detail about Tags:
consul members -detailed
Sample response:
Node Address Status Tags wilsonmar-N2NYQJN46F 127.0.0.1:8301 alive acls=0,ap=default,build=1.12.0:09a8cdb4,dc=dc1,ft_fs=1,ft_si=1,id=40fee474-cf41-1063-2790-c8ff2b14d4af,port=8300,raft_vsn=3,role=consul,segment=<all>,vsn=2,vsn_max=3,vsn_min=2,wan_join_port=8302
Rejoin existing server
If a Consul server fails in a multi-server cluster, bring the server back online using the same IP address.
consul agent -bootstrap-expect=3 \ -bind=192.172.2.4 -auto-rejoin=192.172.2.3
Consul Tutorials from HashiCorp
https://learn.hashicorp.com/consul
https://cloud.hashicorp.com/docs/consul/specifications
Leader/Follower (instead of Master/Slave)
https://learn.hashicorp.com/tutorials/cloud/get-started-consul?in=consul/cloud-get-started
G. Integrations to legacy VMs, mainframes, etc.
- https://medium.com/hashicorp-engineering/supercomputing-with-hashicorp-5c827dcb2db8
Use this to learn about configuring for integrating HashiCorp Consul to work across the entire Enteprise landscape of technologies (another major differentiator of HashiCorp Consul).
Multi-platform (VMWare, mainframe)
VIDEO: Many enterprises also have legacy applications running VMware or still in a mainframe.
That’s where HashiCorp Consul comes in, with multi-platform/cloud
VIDEO: Kubernetes was designed with features to address each, but Consul sychronizes across everal Kubernetes instances – in different clouds – and also sychronizes with Serverless, Cloud Foundry, OpenShift, legacy VMs, even mainframes.
Consul provides better security along with less toil (productivity) for both Kubernetes and legacy platforms, across several clouds.
That’s full enterprise capabilities.
“Multi-platform and multi-cloud choose you, due to corporate mergers and acquisitions and capacity limits in some cloud regions”
You can see how Consul behaves on Power 9 (PPC) and IBM Z (S390x) “mainframe supercomputers” without the expense, emulate them with Hercules or QEMU on pure X86_64 Windows PC, Xeon Linux workstation and KVM but it can also be done on a Mac. Power9, ended up being much simpler than S390.
Using Vagrant
-
VIDEO: Based on a Kubernetes 5-node cluster created using this Helm chart:
-
Install Vagrant and download the Vagrantfile
brew install vagrant # Vagrant 2.2.19 curl -O https://github.com/hashicorp/consul/blog/master/demo/vagrant-cluster/Vagrantfile
CAUTION: As of this writing, Vagrant does not work on Apple M (ARM) chipset on new macOS laptops.
vagrant up
SSH into each server: vagrant ssh n1
helm install ./consul-helm -f ./consul-helm/demo.values.yaml --name consul
- Install Consul binary
- Add Consul Connect to a Kube app
- Integrate legacy apps with Kubernetes
Kubernetes runs a sample “emojify” app which runs an NGNX website calling the “facebox” service API running a machine-learning model to add emoji images on the faces people in input photos (from Honeycomb.io)
“502 Bad Gateway” appears during deployment.
Connect to a Payment service outside Kubernetes.
Customize HTTP Response Headers
- Ask whether you app should have additional security headers such as X-XSS-Protection for API responses.
Collaborations
Ambassador’s Edge Stack (AES) for service discovery.
Competitors
See https://www.consul.io/docs/intro/vs
“[23:07] “Consul Connect is probably the most mature simply because of Consul. Consul is a decade of polished technology, battle-tested in each production environment. It’s a safe choice in terms of stability and features.” – The Best Service Mesh: Linkerd vs Kuma vs Istio vs Consul Connect comparison + Cilium and OSM on top
Service Discovery: Hystrix, Apache, Eureka, SkyDNS
CASE STUDY: Self-Service Service Mesh With HCP Consul Tide abandoned its adoption of AWS AppMesh in favor of HashiCorp Consul, making the transition in only 6 weeks with no downtime and no big-bang migration.
Istio
GitLab
https://konghq.com/kong-mesh
Cisco
H3C
ManageEngine OpManager
Extreme Networks, Inc
Arista Networks
Big Cloud Fabric
Equinix Performance Hub
HPE Synergy
NSX for Horizon
OpenManage Network Manager
CenturyLink
Huawei Cloud Fabric
Aricent
Cloudscaling
Cumulus
HostDime
ArgoCD
Compare against these Reference architecture diagram:
- Architecture for Gateway Load Balancer – East/West Inspection Use Gateway Load Balancer and Transit Gateway to create a highly available and scalable bump-in-the-wire solution for East/West inspection.
-
https://learn.hashicorp.com/tutorials/cloud/amazon-transit-gateway
-
Architecture for Gateway Load Balancer – Centralized Egress Inspection Use Gateway Load Balancer to build highly available and scalable centralized egress environments with traffic inspection.
- Workload Discovery on AWS is a tool to visualize AWS Cloud workloads. Use Workload Discovery on AWS to build, customize, and share detailed architecture diagrams of your workloads based on live data from AWS. https://www.cloudcraft.co/ or https://www.lucidchart.com/blog/how-to-build-aws-architecture-diagrams
References
https://www.hashicorp.com/blog/consul-1-12-hardens-security-on-kubernetes-with-vault?
https://www.pagerduty.com/docs/guides/consul-integration-guide
Simplifying Infrastructure and Network Automation with HashiCorp (Consul and Nomad) and Traefik
VIDEO: “Community Office Hours: HashiCorp Consul on AWS ECS” by Rosemary Wong and Luke Kysow
VIDEO: “Service Mesh and Your Legacy Apps: Connecting to Kubernetes with Consul” by Marc LeBlanc (with Arctiq)
“A Practical Guide to HashiCorp Consul — Part 1 “ by Velotio Technologies
https://thenewstack.io/3-consul-service-mesh-myths-busted/
https://www.youtube.com/watch?v=UHwoEGSfDlc&list=PL81sUbsFNc5ZgO3FpSLKNRIIvCBvqm-JA&index=33 The Meshery Adapter for HashiCorp Consul
https://webinars.devops.com/getting-hashicorp-terraform-into-production (on Azure) by Mike Tharpe with TechStrong
https://github.com/alvin-huang/consul-kv-github-action GitHub Action to pull a value from Consul KV
https://www.hashicorp.com/resources/unboxing-service-mesh-interface-smi-spec-consul-kubernetes
BOOK: “HashiCorp Infrastructure Automation Certification Guide”by Ravi Mishra
Packt BOOK: “Full Stack Development with JHipster - Second Edition” has a section on management of a full-featured sample Java Spring app using Consul instead of the default Eureka (JHipster Registry) which only supports Spring Boot. The author says The main advantages of using Consul are:
- It has a lower memory footprint.
- It can be used with services that are written in any programming language.
- It focuses on consistency rather than availability.
“Consul also provides service discovery, failure detection, multi-datacenter configuration, and key-value storage.”
HashiCorp Corporate Social
Twitter: @hashicorp
Ambassadors (first announced March, 2020)
LinkedIn: https://www.linkedin.com/company/hashicorp
Facebook: https://www.facebook.com/HashiCorp