Wilson Mar bio photo

Wilson Mar

Hello!

Email me Calendar Skype call

LinkedIn Twitter Gitter Instagram Youtube

Github Stackoverflow Pinterest

Enterprise-grade secure Zero-Trust routing to replace East-West load-balancing using service names rather than static IP addresses. Enhance Service Mesh with mTLS and health-based APIs in AWS, Azure, GCP, and other clouds running Kubernetes as well as ECS, EKS, VMs, databases, even mainframes outside Kubernetes

US (English)   Español (Spanish)   Français (French)   Deutsch (German)   Italiano   Português   Estonian   اَلْعَرَبِيَّةُ (Egypt Arabic)   中文 (简体) Chinese (Simplified)   日本語 Japanese   한국어 Korean

Overview

Here are notes while I’m learning about Consul, attempting to be succinct and logically sequenced. All without sales generalizations. All in this one single big page for easy search. This is not a replacement for you going through professionally developed trainings.

Consul is “a multi-cloud service networking platform to connect and secure any service across any runtime platform and public or private cloud”.**

NOTE: Content here are my personal opinions, and not intended to represent any employer (past or present). “PROTIP:” flags information unique to this website, based on my personal research and experience.

The most popular websites about Consul:

  1. The marketing home page for HashiCorp’s Consul:
    https://www.consul.io/

  2. Wikipedia entry:
    https://www.wikiwand.com/en/Consul_(software)

    “Consul was initially released in 2014 as a service discovery platform. In addition to service discovery, it now provides a full-featured service mesh for secure service segmentation across any cloud or runtime environment, and distributed key-value storage for application configuration.[2]

    Registered services and nodes can be queried using a DNS interface or an HTTP interface.[1] Envoy proxy provides security, observability, and resilience for all application traffic.”

  3. Detailed technical documentation:
    https://www.consul.io/docs

  4. Tutorials from HashiCorp:
    https://learn.hashicorp.com/tutorials/consul/service-mesh

  5. Technical Discussions:
    https://discuss.hashicorp.com/c/consul/29

  6. Stackoverflow has highly technical questions & answers:
    https://stackoverflow.com/search?q=%23hashicorp-consul

  7. Reddit:
    https://www.reddit.com/search/?q=hashicorp%20consul

  8. Licensed Support from HashiCorp is conducted using those authorized to access HashiCorp’s ZenDesk system:
    https://hashicorp.freshservice.com/helpdesk/tickets


Due to Microservices

“Microservices is the most popular architectural approach today. It’s extremely effective. It’s the approach used by many of the most successful companies in the world, particularly the big web companies.” –Dave Farley

In hopes of building more reliable systems in the cloud faster and cheaper, enterprises create distributed microservices instead of monolithic architectures (which are more difficult to evolve).

Microservices seem like a good idea because it promises:

  • Their Ephemeral services enable each service to move and scale independently (reduce dev teams waiting for each other)
  • It simplifies unit testing of individual services
  • It increases agility
  • Greater operational efficiency

Legacy networking infrastructure mismatches

However, each new paradigm comes with new problems.

A common explanation of what Consul does references three technical categories:

The concerns that Consul solves can be categorized thus:

“Consul is a datacenter runtime that provides 1) service discovery, 2) configuration, and 3) orchestration.”

Implementation of microservices within legacy infrastructure and “fortress with a moat” mindset (rather than “Zero Trust” and other security principles) creates these concerns:

Orchestration

A. When traffic is routed based on IP addresses, traffic is sent blindly without identity authentication (a violation of “Zero Trust” mandates).

B. Traffic routing mechanisms (such as IPTables) were designed to manage external traffic, not traffic internally between services.

Service Discovery

C. So mechanisms intended to secure external traffic (such as IPTables) are drafted for use to secure internal traffic among app services. Such mechanisms are usually owned and managed for the whole enterprise by the Networking department. So developers spend too much time requesting permissions for accessing IP addresses. And Network departments now spend too much time connecting internal static IP addresses for internal communications among services when many don’t consider it part of their job.

D. Due to lack of authentication (using IP Addresses), current routing does not have mechanisms for fine-grained permission policies that limit what operation (such as Read, Write, Update, Delete, etc.) is allowed. That implements “Least Privilege” principles.

E. Also due to lack of authentication, current routing does not have the metadata to segment traffic in order to split a percentage of traffic to different targets for various types of testing.

  DEFINITION: "Micro segmentation" is the logical division of the internal network into distinct security segments at the service/API level. Its use enables granular access control to, and visiblity of, discrete service interface points. Reference: <a target="_blank" href="https://dodcio.defense.gov/Portals/0/Documents/Library/CNAP_RefDesign_v1.0.pdf">PDF: "US Department of Defense (DoD) Cloud Native Access Point (CNAP) Reference Design (RD)"</a>
   
  The segmentation that "East-West" (internal) Load Balancers with advanced "ISO Level 7" capability (such as F5) can perform is more limited that what Consul can do with its more granualar metadata about each service. 
   
  Not only that, Load Balancers are <strong>a single point of failure</strong>. So an alternative is needed which has been architected for resilience and high availability to failures in individual nodes, Availability Zones, and whole Regions.

F. In an effort mitigate the network features lacking, many developers now spend too much time coding network-related communication logic into each application program (for retries, tracing, secure TLS, etc.).

Kubernetes a partial solution

Kubernetes (largely from Google) has been popular as “orchestrator” to replace instances of pods (holding Containers) when any of them go offline.

NOTE: Kubernetes is currently not mature when it comes to adding more pods (to scale up) or removing pods (to scale down).

However, core Kubernetes currently still has these deficiencies:

G. Kubernetes does not check if a service is healthy before trying to communicate with it. This leads to the need for coding of applications to perform time-outs, which is a distraction and usually not a skill by most business application coders.

H. Kubernetes does not encrypt communications between services.

I. Kubernetes does not provide a way to communicate with components and cloud services outside Kubernetes such as databases, ECS, other EKS clusters, Serverless, Observability platforms, etc. Thus, Kubernetes by default does not by itself enable deep transaction tracing.

References:


Legacy mismatches solved by Consul Editions

Consul provides a mechanism for connecting dynamic microservices with legacy networking infrastructure.

The list below send you to how each edition of Consul solves the mismatches described above.


Free Open Source Software Features

The main component of the Consul product – the Consul Agent executable “consul” – can be controlled using CLI commands without licensing as FOSS (Free open-sourced software) using code open-sourced at:

References:

PROTIP: Here are Agile-style stories requesting use of HashiCorp Consul (written by me):

Consul Concepts in UI Menu

The Consul Enterprise edition menu can serve as a list of concepts about Consul:

Consul Enterprise menu at v1.1.5 “dc1” is the name of a Consul “datacenter” – a cluster of Consul servers within a single region.

Multiple “Admin Partitions” and “Namespaces” are Consul Enterprise features.

Consul manages applications made available as Services on the network.

Nodes are Consul servers which manage network traffic. They can be installed separately from application infrastructure.

  1. Rather than A. blindly routing traffic based on IP addresses, which have no basis for authentication (a violation of “Zero Trust” mandates), Consul routes traffic based on named entities (such as “C can talk to A” or “C cannot talk to A.”).

    Consul Enterprise can authenticate using several Authentication Methods

  2. Rather than B. routing based on IPTables designed to manage external traffic, Consul routes from its list of “Intentions” which define which other entities each entity (name) can access.

    Consul does an authentication hand-shake with each service before sending it data. A rogue service cannot pretend to be another legitimate service unless it holds a legitimate encryption certificate assigned by Consul. And each certificate expires, which Consul works to rotate.

  3. Rather than C. manually creating a ticket for manual action by Networking people connecting internal static IP addresses, Consul discovers the network metadata (such as IP addresses) of each application service when it comes online, based on the configuration defined for each service. This also means that Network people would spend less time for internal communications, freeing them up for analysis, debugging, and other tasks.

    Roles and Policies

  4. Consul’s Key/Value store holds a “service registry” containing ACL (Access Control List) policy entries which define what operations (such as Read, Write, Update, Delete, etc.) is allowed or denied for each role assigned to each named entity. This adds fine-grained security functionality needed for “Zero Trust”.

    As Consul redirects traffic, it secures the traffic by generating certificates used to encrypt traffic on both ends of communication, taking care of automatic key rotation hassles, too. BTW This mechanism is called “mTLS” (mutual Transport Layer Security).

  5. Instead of E. requiring a Load Balancer or application coding to split a percentage of traffic to different targets for various types of testing, Consul can segment traffic based on attributes associated with each entity. This enables more sophisticated logic than what traditional Load Balancer offer.

    Consul can route based on various algorithms (like F5) “Round-robin”, “Least-connections”, etc.

    That means Consul can, in many cases, replace “East-West” load balancers</a>, to remove load balancers (in front of each type of service) as a single-point-of-failure risk.

  6. With Consul, instead of F. Developers spending too much time coding network communication logic in each program (for retries, tracing, secure TLS, etc.)</a>, networking logic can be managed in a GUI.

    Since Consul is added as additional servers in parallel in the same infrastructure, changes usually involve configuration rater than app code changes. Thus, Consul can connect/integrate services running both on-prem servers and in clouds, inside and outside Kubernetes.

  1. Within the system, obtain the health status of each app server so that traffic is routed only to healthy app services, so provide a more aware approach than load balancers blindly routing (by Round-Robin).

Partial Kubernetes Remediation using Service Mesh

References:

To overcome G. Kubernetes not checking if a service is healthy before trying to communicate, many are adding a “Service Mesh” to Kubernetes. Although several vendors offer the addition, “Service Mesh” generally means the installation of a network proxy agent (a “sidecar”) installed within each pod alongside app containers.

“Envoy” is currently the most popular Sidecar proxy. There are alternatives.

Consul Service Mesh

When app developers allow all communication in and out of their app through a Sidecar proxy, they can focus more on business logic rather than the intricacies of retries after network failure, traffic encryption, transaction tracing, etc.

Due to G. Kubernetes and Sidecars not encrypting communications between services, Consul is becoming a popular add-on to Kubernetes Service Mesh because it can add mTLS (use of mutual TLS certificates used to encrypt transmissions on both server and clients) without coding in application code.

Although H. Kubernetes does not check if a service is healthy before trying to communicate, Consul performs health checks and maintains the status of each service. Thus, Consul never routes traffic to known unhealthy pods. And so apps don’t need to be coded with complex timeouts and retry logic.

Although I. Kubernetes does not provide a way to communicate with components and cloud services outside Kubernetes, Consul can dynamically configure sidecars such as Envoy to dynamically route or duplicate traffic to “Observability” platforms such as Datadog, Prometheus, Splunk, New Relic, etc. who performs analytics they display on dashboards created using Grafana and other tools.


Additional (teamwork and security) features are unlocked with licensing of an Consul Enterprise installed by customer-(self)-managed organizations.

Features:

Tokens

A. Authenticate using a variety of methods. In addition to ACL Tokens, use enteprise-level identity providers (such as Okta and GitHub, Kuberos with Windows, etc.) for SSO (Single Sign On) based on indentity information maintained in email systems, so that addition and deletions of email get reflected in applications immediately.

B. Automatic Upgrades (“Autopilot” feature) of a whole set of nodes at once – this avoids the need for manual effort and elimination of times when different versions exist at the same time.

C. Enhanced Audit logging – to better understand access and API usage patterns. A full set of audit logs makes Consul a fully enterprise-worthy utility.

D. Enable Multi-Tenancy of tenants enabled using “Admin Partitions” as “Namespaces” to segment data into separate different teams within a single Consul datacenter, a key “Zero Trust” principal to diminish the “blast radius” from potential compromise of credentials to a specific partition.

E. Consul can take automatic action when its metadata changes, such as notifying apps and firewalls, to keep security rules current (using NIA CTS).

The “consul-terraform-sync” (CTS) module broadcast changes recognized which can be used to update Terraform code dynamically for automatic resources reconfiguration – This decreases the possibility of human error in manually editing configuration files and decreases time to propagate configuration changes to networks.

F. Policy enforcement using Sentinel extend the ACL system in Consul beyond the static “read”, “write”, and “deny” policies to support full conditional logic during writes to the KV store. Also integrates with external systems

G. Better Resilency from scheduled Backups of Consul state to snapshot files – this makes backups happen without needing to remember to take manual effort.

H. Consul is designed for additional Consul servers to be added to a Consul Cluster to achieve enterprise-scale scalability. The performance scaling mechanism involves adding “Redundancy Zones” which only read metadata (as “non-voting” nodes).

  • Large enterprises have up to 4,000 microservices running at the same time.
  • “Performance begins to degrade after 7 voting nodes due to server-to-server Raft protocol traffic, which is expensive on the network.”
  • Global Consul Scale Benchmark tests (below) proved Consul’s enterprise scalability.

I. Consul Service Mesh (also called Enterprise “Consul Connect”) enables a Kubernetes cluster to securely communicate with services outside itself. Connect enables communication between a Sidecar proxy in Kubernetes to reach an API Gateway (which acts like a K8s Sidecar proxy) surrounding stand-alone databases, ECS, VMs, Severless, even across different clouds.

As with HashiCorp’s Terraform, because the format of infrastructure configuration across multiple clouds (AWS, Azure, GCP, etc.) are similar in Consul, the learning necessary for people to work on different clouds is reduced, which yields faster implementations in case of mergers and acquisitions which require multiple cloud platforms to be integrated quickly. VIDEO

J. Consul can be setup for Disaster Recovery (DR) from failure to an entire cloud Region. Consul has a mechanism called “WAN Federation” which replicate service metadata across regions to enable multi-region capability.

Consul Multi-Region setup with 5 nodes each

Fail-over to a whole Region is typically setup to involve manual intervention. However, the use of Consul Service Mesh with Health Checks would enable automated failover within the context of a SOC (Security Operations Center)Governance Model.

References:

K.
L.

Multi-region redundancy using complex Network Topologies between Consul datacenters (with “pairwise federation”) – this provides the basis for disaster recovery in case an entire region disappears.

The above features enable a cluster of Consul servers for Enterprises to provide both Highly Availability (fault tolerance) to whole Availability Zone failure

Within a single datacenter, Consul can be setup (using a combination of Service Mesh and Health Check) to provide automatic failover for services by omitting failed service instances from DNS lookups and by providing service health information in APIs.

s which has duplicate nodes by replicating metadata across availability zones and regions.

References:


Security Frameworks

This section provides more context and detail about security features of Consul.

There are several frameworks which security professionals use to organize controls they install to prevent ransomware, data leaks, and other potential security catatrophes. Here are the most well-known:

PROTIP: Within the description of each framework, links are provided here to specific features which Consul provides (as Security Mitigations).

Well-Architected Framework (WAF)

A “Well-Architected Framework” is referenced by all major cloud providers.

  • https://wa.aws.amazon.com/wat.pillar.security.en.html

The security:

Security professionals refer to the “CIA Triad” for security:

  1. Confidentiality by limiting access
  2. Integrity of data that is trustworthy
  3. Availability for reliable access

Zero-trust applies to those three:

Zero-Trust CIA Triad

  • Identity-driven authentication (by requester name instead of by IP address)

  • Mutually authenticated – both server and client use a cryptographic certificate to

  • Encrypt for transit and at rest (baked into app lifecycle via CI/CD automation)

  • Each request is time-bounded (instead of long-lived static secrets to be hacked)

  • Audited & Logged (for SOC to do forensics)

References:

zzz

The “Kill Chain” (created by Lockheed-Martin) organizes security work into the 9 stages how malicious actors work.

Specific tools and techniques that adversaries use (on specific platforms) are organized within PDF: 14 goals in the “ATT&CK” Enterprise Matrix lifecycle from Mitre Corporation (a US defense think-tank) in 2013.

A comparison between the above:

Kill Chain Mitre ATT&CK Mitigations
1. Reconnaissance (harvesting) Reconnaissance,
Resource Development
Authentication
2. Weponization (exploit of backdoor into a deliverable payload) Initial Access,
Execution
mTLS
3. Delivery (into victim) Persistence,
Privilege Escalation
Audit logs & Alerts
4. Exploitation (of vulnerability) Defense Evasion (Access Token Manipulation) ACL
5. Installation (of malware) Credential Access,
Discovery (System Network Connections Discovery),
Lateral Movement (Exploitation of Remote Services, Remote Service Session Hijacking ),
Collection (Man-in-the-Middle)
Authorization
6. Command and Control (remote manipulation) Command and Control (Application Layer Protocol, Web Service, Dynamic Resolution) Segmentation
7. Actions on Objectives Exfiltration,
Impact
DLP (Data Loss Prevention)

Mitigation Actions

Part of a Cloud Operating Model suite

Consul is part of the HashiCorp “Cloud Operating Model” product line which provides modern mechanisms for better security and efficiency in access and communication processes:

hashi-oss-prods-3130x1306

These products are collectively referred to as “HashiStack”.

Consul, Vault, and Boundary together provides the technologies and workflows to achieve SOC2/ISO27000 and “Zero Trust” mandates in commercial enterprises and within the U.S. federal government and its suppliers.

References:

  • VIDEO Microservices with Terraform, Consul, and Vault

Zero Trust Maturity Model

HashiCorp’s HashiStack is used by many enterprises to transition from “Traditional” to “Optimal”, as detailed by the US CISA “Zero Trust Maturity Model” at https://www.cisa.gov/sites/default/files/publications/CISA%20Zero%20Trust%20Maturity%20Model_Draft.pdf (19 pages): Zero Trust Maturity

Categories of “Defense in Depth” techniques listed in PDF: Mitre’s map of defense to data sources:

  • Password Policies
  • Active Directory Configuration
  • User Account Control
  • Update Software
  • Limit Access to Resources Over Network
  • Audit (Logging)
  • Operating System Configuration
  • User Account Management
  • Execution Prevention
  • Privileged Account Management
  • Disable or Remove Feature or Program
  • Code Signing
  • Exploit Protection
  • Application Isolation and Sandboxing
  • Antivirus/Antimalware
  • Filter Network Traffic
  • Network Segmentation
  • User Training
  • SSL/TLS Inspection
  • Restrict Web-based Content

Additionally:

  • To prevent Lateral Movement (Taint Shared Content): Immutable deployments (no live patching to “cattle”)

  • IaC CI/CD Automation (processes have Security and Repeatability baked-in, less toil)

  • Change Management using source version control systems such as Git clients interacting with the GitHub cloud

Summary of Use Cases

In summary, use cases for Consul (listed at https://www.consul.io/):

  • Consul on Kubernetes
  • Control access with Consul API Gateway
  • Discover Services with Consul
  • Enforce Zero Trust Networking with Consul
  • Load Balancing with Consul
  • Manage Traffic with Consul
  • Multi-Platform Service Mesh with Consul
  • Network Infrastructure Automation with Consul
  • Observability with Consul

Benefits of Adoption of Consul aims to yield these benefits:

  • Faster Time to Market and velocity of getting things done from less manual mistakes
  • Reduce cost via tools (operational efficiency through more visibility and automation)
  • Reduce cost via people from improved availability (uptime)
  • Reduce risk of downtime from better reliability
  • Reduce risk of breach from better guardrails (using Sentinel & OPA)
  • Compliance with regulatory demands (central source of truth, immutable, automated processes)


BOOK: Consul: Up and Running

Canadian Luke Kysow, Principal Engineer on Consul at HashiCorp, top contributor to hashicorp/consul-k8s, wrote in his BOOK: “Consul: Up and Running”:

“A small operations team can leverage Consul to impact security, reliability, observability, and application delivery across their entire stack —- all without requiring developers to modify their underlying microservices.”

Code for the book (which you need to copy and paste into your own GitHub repo) is organized according to the book’s chapters:

  1. Service Mesh 101
  2. Introduction to Consul
  3. Deploying Consul within K8s (in cloud or minikube for automatic port-forwarding) and on VMs
  4. Adding Services to the Mesh
  5. Ingress Gateways
  6. Security
  7. Observability
  8. Reliability
  9. Traffic Control
  10. Advanced Use Cases

birdwatcher app

and Discord server for the book)

The above are used for showing Proof of Value (POV) from product/workflow adoption.

  • https://www.consul.io/docs/intro
  • https://learn.hashicorp.com/well-architected-framework


YouTube: “Getting into HashiCorp Consul”

VIDEO: Consul Roadmap – HashiConf Global 2021


Ways to setup Consul with demo infra

PROTIP: Become comfortable with the naming conventions used by the architecture, workflows, and automation by building several environments, in order of complexity:

By “use case” (Sales Plays):

A. There is a public demo instance of Consul online at:

https://demo.consul.io/ui/dc1/overview/server-status

B. On HashiCorp’s Consul SaaS on the HCP (HashiCorp Cloud Platform):

  • QUESTION: You can use Consul this way with just a Chromebook laptop???
  • Use this to learn about creating sample AWS services in a private VPC using Terraform, createing a HCP account, cloud peering connections across private networks to HVN, day-to-day workflows on https://cloud.hashicorp.com/products/consul
  • On AWS or Azure

C. On a macOS laptop install to learn Consul Agent with two nodes (to see recovery of loss from a single node):

  • Use automation to install the Consul agent along with other utilities needed
  • Use this to learn about basic CLI commands, starting/stopping the Agent, API calls, GUI menus using a single server within a Docker image

D. Install the Reference Architecture (with Kubernetes and database) which can survive loss of a single node

  • Follow a multi-part video series on YouTube to install and configure 5 Consul nodes in 3 Availability Zones (AZs) within a single region, with app Gateways, Sidecar monitoring

E. In a single 6-node datacenter (with Nomad) to survive loss of an Availability Zone

  • Use this to learn about manual backup and recovery using Snapshots and Enterprise Snapshot Agents,
  • Conduct Chaos Engineering recovering failure of one Availability Zone
  • Telemetry and Capacity proving to identify when to add additional Consul nodes

F. For multiple datacenters federated over WAN

G. Integrations between K8s Service Mesh to outside database, ECS, VMs, mainframes, etc.

Other demos:

  • https://www.hashicorp.com/resources/getting-started-with-managed-service-mesh-on-aws First Beta Demo of HCP Consul Service Mesh on AWS.

Demo apps

PROTIP: Adapt the samples and naming conventions here to use your own app after achieving confidence you have the base templates working.

https://medium.com/hashicorp-engineering/hashicorp-vault-performance-benchmark-13d0ea7b703f

https://cloud.hashicorp.com/docs/hcp/supported-env/aws

https://github.com/pglass/202205-consul-webinar-demo

  1. HashiCorp-provided demo apps included in the practice environments are defined at:

    https://github.com/hashicorp-demoapp/

    “Hashicups” from https://github.com/hashicorp-demoapp/hashicups-setups comes with a Go library.

  2. Consider the HashiCups datacenter which uses both ECS and EKS within AWS:

    • Run front-end services task within a ECS (Elastic Container Service) cluster
    • Run back-end services task within a EKS (Elastic Kubernetes Service) cluster

    See VIDEO “Securely Modernize Application Development with Consul on AWS ECS” by Jairo Camacho (Marketing), Chris Thain, Paul Glass (Engineering)

    Consul ECS and EKS with HashiCups

  3. Create the above environment by running Terraform ???

    https://github.com/pglass/202205-consul-webinar-demo

    https://github.com/hashicorp/terraform-aws-consul-ecs

    Consult ECS

  4. Use HCP Consul for Service Mesh (without Kubernetes)

    Consul HCP Service Mesh

    The Envoy proxy in Data Plane ???

    Control Plane to Consul servers within HCP ???

    Consul’s Layer 7 traffic management capabilities. ???

    ACL Controller

    The ACL (Access Control List) Controller is provided by HashiCorp for install within AWS.

    To provide least-privilege access to Consul using Terraform and Vault: https://www.hashicorp.com/blog/managing-hashicorp-consul-access-control-lists-with-terraform-and-vault

    Consul TF Vault

    Observability

    REMEMBER: Enterprise editions of Consul is a different binary than OSS edition.

    Terraform adds Datadog for Observability.

    https://www.pagerduty.com/docs/guides/consul-integration-guide/ shows how to configure Consul-Alerts to trigger and resolve incidents in a PageDuty service. PagerDuty is an alarm aggregation and dispatching service for system administrators and support teams. It collects alerts from monitoring tools, gives an overall view of all of monitoring alarms, and alerts an on-duty engineer if there’s a problem. The Terraform Pagerduty provider is a plugin for Terraform that allows for the management of PagerDuty resources using HCL (HashiCorp Configuration Language).


Certification exam

Because this document aims to present concepts in a logic flow for learning, it has a different order than topics for the Consul Associate one-hour proctored on-line $70 exam at: https://www.hashicorp.com/certification/consul-associate

  1. Explain Consul architecture
    1a. Identify the components of Consul datacenter, including agents and communication protocols
    1b. Prepare Consul for high availability and performance
    1c. Identify Consul’s core functionality
    1d. Differentiate agent roles

  2. Deploy a single datacenter
    2a. Start and manage the Consul process
    2b. Interpret a Consul agent configuration
    2c. Configure Consul network addresses and ports
    2d. Describe and configure agent join and leave behaviors

  3. Register services and use Service Discovery [BK]
    3a. Interpret a service registration
    3b. Differentiate ways to register a single service
    3c. Interpret a service configuration with health check
    3d. Check the service catalog status from the output of the DNS/API interface or via the Consul UI
    3e. Interpret a prepared query
    3f. Use a prepared query

  4. Access the Consul key/value (KV) even though it’s not a popular feature anymore
    4a. Understand the capabilities and limitations of the KV store
    4b. Interact with the KV store using both the Consul CLI and UI
    4c. Monitor KV changes using watch
    4d. Monitor KV changes using envconsul and consul-template

  5. Back up and Restore [BK
    5a. Describe the content of a snapshot 5b. Back up and restore the datacenter
    5c. [Enterprise] Describe the benefits of snapshot agent features

  6. Use Consul Service Mesh
    6a. Understand Consul Connect service mesh high-level architecture
    6b. Describe configuration for registering a service proxy
    6c. Describe intentions for Consul Connect service mesh
    6d. Check intentions in both the Consul CLI and UI

  7. Secure agent communication
    7a. Understanding Consul security/threat model
    7b. Differentiate certificate types needed for TLS encryption
    7c. Understand the different TLS encryption settings for a fully secure datacenter

  8. Secure services with basic access control lists (ACL)
    8a. Set up and configure a basic ACL system
    8b. Create policies
    8c. Manage token lifecycle: multiple policies, token revoking, ACL roles, service identities
    8d. Perform a CLI request using a token
    8e. Perform an API request using a token

  9. Use Gossip encryption
    9a. Understanding the Consul security/threat model
    9b. Configure gossip encryption for the existing data center
    9c. Manage the lifecycle of encryption keys

Bryan Krausen provides links to discount codes to his Udemy, “Getting Started with HashiCorp Consul 2022” has 8.5 hours of video recorded at Consul 1.7. It provides quizzes and a >mind-map of each topic and references https://github.com/btkrausen/hashicorp/tree/master/consul

Also from Bryan is “HashiCorp Certified: Consul Associate Practice Exam” three full exams of 57 questions each.


B. On HashiCorp’s Consul Cloud SaaS HCP (HashiCorp Cloud Platform)

Perhaps the fastest and easiest way to begin using Consul is to use the Hashcorp-Managed HashiCorp Cloud Platform (HCP) Consul Cloud. It provides a convenient clickable Web GUI rather than the CLI/API of FOSS (free open-source software).

HCP provides a fully managed “Service Mesh as a Service (SMaaS)” Consul features not provided with the “self-managed” Enterprise edition. That means:

  • Monitoring to ensure disk space, CPU, memory, etc. is already staffed
  • Capacity testing to ensure configurations are made optimal by specialists
  • No risk of security vulnerabilities introduced by inexperienced personnel
  • Backups taken care of automatically
  • Restores performed when needed

  • Rest from on-going hassles of security patches and version upgrades
  • Enable limited in-house IT personnel to focus on business needs.
  • Faster time to value and time to market

On the other hand, at of this writing, HCP does not have all the features of Consul Enterprise.

References about HCP Consul:

  • https://github.com/hashicorp/learn-hcp-consul
  • https://github.com/hashicorp/learn-terraform-multicloud-kubernetes
  • Part 12: HCP Consul [2:18:49] Mar 17, 2022

  • HashiCorp’s 7 tutorials on HCP Consul:
  • https://www.hashicorp.com/products/consul/service-on-azure
  • announced Sep 2020
  • VIDEO: “Introduction to HashiCorp Cloud Platform (HCP): Goals and Components”

  • VIDEO: “Service Mesh - Beyond the Hype”
  • hashicorp/consul-snippets Private = Collection of Consul snippets. Configuration bits, scripts, configuration, small demos, etc.

  • https://github.com/hashicorp/field-workshops-consul = Slide decks and Instruqt code for Consul Workshops
  • https://github.com/hashicorp/demo-consul-101 = Tutorial code and binaries for the HashiCorp Consul beginner course.
  • https://github.com/hashicorp/learn-consul-docker = Docker Compose quick starts for Consul features.

  • https://github.com/hashicorp/terraform-aws-vault A Terraform Module for how to run Consul on AWS using Terraform and Packer

  • https://github.com/hashicorp/hashicat-aws = A terraform built application for use in Hashicorp workshops

  • https://github.com/hashicorp/consul-template = Template rendering, notifier, and supervisor for @hashicorp Consul and Vault data.

  • https://github.com/hashicorp/consul-k8s = First-class support for Consul Service Mesh on Kubernetes, with binaries for download at https://releases.hashicorp.com/consul-k8s/

  • https://github.com/hashicorp/consul-replicate = Consul cross-DC KV replication daemon.

  • hashicorp/learn-consul-kubernetes
  • https://github.com/hashicorp/learn-consul-service-mesh

  • https://github.com/hashicorp/consul-api-gateway = The Consul API Gateway is a dedicated ingress solution for intelligently routing traffic to applications running on a C…

  • https://github.com/hashicorp/consul-demo-traffic-splitting = Example application using Docker Compose to demonstrate Consul Service Mesh Traffic Splitting

  • hashicorp/consul-esm = External service monitoring for Consul

  • https://github.com/hashicorp/terraform-aws-consul-starter = A Terraform module for creating an OSS Consul cluster as described by the HashiCorp reference architecture.



The Automated Way

  1. Obtain an AWS account credentials with adequate premissions
  2. Create an AWS VPC and associated resources to be managed by additional Consul infra
  3. Identify your lb_ingress_ips used in the load balancer security groups, needed to limit access to the demo app.
  4. Configure kubectl
  5. Create a HashiCorp Platform (HCP) cloud account and organization
  6. Store secrets in a safe way
  7. Create a HashiCorp Virtual Network (HVN)
  8. Peer the AWS VPC with the HVN
  9. Create a HCP Consul cluster
  10. Configure Consul ACL Controller
  11. Run Consul clients within the provisioned AWS VPC
  12. Put load on the demo app within AWS

  13. Destroy Consul cluster and app infra under test

Obtain AWS account credentials

  1. Obtain AWS credentials (AWS_) and populate file ~/.aws/configuration or environment variables.

    export AWS_ACCESS_KEY_ID=your AWS access key ID
    export AWS_SECRET_ACCESS_KEY=your AWS secret access key
    export AWS_SESSION_TOKEN=your AWS session token
    

    Alternately, copy and paste credentials in the ~/.aws/credentials file that every AWS CLI command references.

    BTW If you are a HashiCorp employee, they would be obtained for the “Doormat” website, which grants access to your laptop’s IP address for a limited time.

    Create resources within AWS

    There are several ways to setup infrastructure in a cloud datacenter managed by Consul.

    Instead of performing manual steps at https://learn.hashicorp.com/tutorials/cloud/consul-deploy, this describes use of Terraform to create a non-prod HCP Consul environment to manage an ECS cluster, and various AWS services:

    Consul ECS HCP

  2. Navigate to where you download GitHub repo.

  3. Do not specify –depth 1 when cloning (because we will checkout a branch):

    git clone git@github.com:hashicorp/learn-consul-terraform.git
    cd learn-consul-terraform
    
  4. Before switching to a branch, get a list of the branches:

    git tag
    
    git checkout v0.5
    
  5. Navigate to the folder within the repo:

    cd datacenter-deploy-ecs-hcp
    

    TODO: Study the Terraform specifications:

    • variables.tf - Parameter definitions used to customize unique user environment attributes.
    • data.tf - Data sources that allow Terraform to use information defined outside of Terraform.
    • providers.tf - AWS and HCP provider definitions for Terraform.
    • outputs.tf - Unique values output after Terraform successfully completes a deployment.

    • ecs-clusters.tf - AWS ECS cluster deployment resources.
    • ecs-services.tf - AWS ECS service deployment resources.
    • load-balancer.tf - AWS Application Load Balancer (ALB) deployment resources.
    • logging.tf - AWS Cloudwatch logging configuration.
    • modules.tf - AWS ECS task application definitions.
    • secrets-manager.tf - AWS Secrets Manager configuration.
    • security-groups - AWS Security Group port management definitions.
    • vpc.tf - AWS Virtual Private Cloud (VPC) deployment resources.

    • network-peering.tf - HCP and AWS network communication configuration.
    • hvn.tf - HashiCorp Virtual Network (HVN) deployment resources.
    • hcp-consul.tf - HCP Consul cluster deployment resources.

    See https://learn.hashicorp.com/tutorials/consul/reference-architecture for Scaling considerations.

    https://learn.hashicorp.com/tutorials/consul/production-checklist?in=consul/production-deploy

  6. Identify your IPv4 address (based on the Wi-Fi you’re using):

    curl ipinfo.io
    
    {
      "ip": "129.222.5.194",
    
  7. terraform.tfvars.example

  8. Configure Terraform variables in a .auto.tfvars (or terraform.tfvars) file with, for example:

    lb_ingress_ips = "47.223.35.123"
    region         = "us-east-1"
    suffix         = "demo"
    

    region - the AWS region where resources will be deployed. PROTIP: Must be one of the regions HCP suppors for HCP Consul servers.

    lb_ingress_ips - Your IP. This is used in the load balancer security groups to ensure only you can access the demo application.

    suffix text value AWS appends to resource names its creates. This needs to be changed in each run because, by default, secrets created by AWS Secrets Manager require 30 days before they can be deleted. If this tutorial is destroyed and recreated, a name conflict error will occur for these secrets.

  9. Run using terraform init

    VIDEO: Try it:

  10. In the folder containing main.tf, run terraform to inititate :

    terraform init
    

    Example response:

    Initializing modules...
    Downloading registry.terraform.io/hashicorp/consul-ecs/aws 0.2.0 for acl_controller...
           - acl_controller in .terraform/modules/acl_controller/modules/acl-controller
    Downloading registry.terraform.io/hashicorp/consul-ecs/aws 0.2.0 for example_client_app...
           - example_client_app in .terraform/modules/example_client_app/modules/mesh-task
    Downloading registry.terraform.io/hashicorp/consul-ecs/aws 0.2.0 for example_server_app...
           - example_server_app in .terraform/modules/example_server_app/modules/mesh-task
    Downloading registry.terraform.io/terraform-aws-modules/vpc/aws 2.78.0 for vpc...
           - vpc in .terraform/modules/vpc
     
    Initializing the backend...
     
    Initializing provider plugins...
           - Finding hashicorp/hcp versions matching "~> 0.14.0"...
           - Finding hashicorp/aws versions matching ">= 2.70.0, > 3.0.0"...
           - Installing hashicorp/hcp v0.14.0...
           - Installed hashicorp/hcp v0.14.0 (signed by HashiCorp)
           - Installing hashicorp/aws v4.16.0...
           - Installed hashicorp/aws v4.16.0 (signed by HashiCorp)
     
    Terraform has created a lock file .terraform.lock.hcl to record the provider
    selections it made above. Include this file in your version control repository
    so that Terraform can guarantee to make the same selections by default when
    you run "terraform init" in the future.
     
    Terraform has been successfully initialized!
     
    You may now begin working with Terraform. Try running "terraform plan" to see
    any changes that are required for your infrastructure. All Terraform commands
    should now work.
     
    If you ever set or change modules or backend configuration for Terraform,
    rerun this command to reinitialize your working directory. If you forget, other
    commands will detect it and remind you to do so if necessary.
    
  11. In the folder containing main.tf, run terraform to design:

    time terraform plan
    

    After many minutes, sample response ends with:

    Apply complete! Resources: 64 added, 0 changed, 0 destroyed.
     
    Outputs:
     
    client_lb_address = "http://learn-hcp-example-client-app-1643813623.us-east-1.elb.amazonaws.com:9090/ui"
    consul_ui_address = "https://dc1.consul.b17838e5-60d2-4e49-a43b-cef519b694a5.aws.hashicorp.cloud"
    
  12. If Sentinel or TFSec was installed:

    tfsec
    
  13. In the folder containing main.tf, run terraform to instantiate in AWS:

    time terraform apply
    
  14. (optional) Configure kubectl

    aws eks --region $(terraform output -raw region) update-kubeconfig --name $(terraform output -raw local.eks_cluster_name)
    kubectl get pods -A
    \
    
  15. To access the Consul UI in HCP, print the URL and bootstrap token to access the Consul UI. The bootstrap token can be used to login to Consul.

    terraform output consul_public_endpoint_url
    terraform output consul_bootstrap_token
    
  16. Access the demo application in ECS: print the URL for the demo application:

    terraform output ecs_ingress_address
    

CTS for NIA

HashiCorp’s “Network Infrastructure Automation (NIA)” marketing page (consul.io/docs/nia) promises to scale better, decrease the possibility of human error when manually editing configuration files, and decrease overall time taken to push out configuration changes.

PROTIP: There are current no competitors in the market for this feature.

LEARN: Network Infrastructure Automation with Consul-Terraform-Sync hands-on, which uses the sample counting service at port 9003 and dashboard service in port 9002, from https://github.com/hashicorp/demo-consul-101/releases

Consul NIA CTA

  1. Intro (using terraform, Consul “consul-terraform-sync” CLI) 17 MIN

  2. Consul-Terraform-Sync Run Modes and Status Inspection task execution status using REST API. 9 MIN
  3. CTS and Terraform Enterprise/Cloud integration. 14 MIN
  4. Build a Custom CTS Module. 20 MIN
  5. Secure Consul-Terraform-Sync for Production. 13 MIN
  6. Partner Guide - Consul NIA, Terraform, and A10 ADC. 12 MIN
  7. Partner Guide - Consul NIA, Terraform, and F5 BIG-IP. 12 MIN
  8. Partner Guide - Consul NIA, CTS, and Palo Alto Networks. 12 MIN

References:

CTS flow

CTS (Consul-Terraform Sync) Agent is an executable binary (“consul-terraform-sync” daemon separate from Consul) installed on a server.

NOTE: HashiCorp also provides binaries for various back releases at
https://releases.hashicorp.com/consul-terraform-sync/

Notice the “+ent” for enterprise editions.

    brew tap hashicorp/tap
    brew install hashicorp/tap/consul-terraform-sync
    consul-terraform-sync -h
       

When the daemon starts, it also starts up a Terraform CLI/API binary locally.

See https://www.consul.io/docs/nia/configuration

CTS interacts with the Consul Service Catalog in a publisher-subscriber paradigm.

CTS has Consul acting as the central broker – changes trigger Consul to subscribe to Terraform assets. CTS can respond to changes in Service Registry. CTS can also watch for changes in its KV (Key-Value) store.

When CTS recognizes relevant changes requiring action, it dynamically generates files that invoke Terraform modules. Thus, CTS can interact with Terraform Cloud Driver’s Remote Workspaces. Advantages of this:

  • Remote Terraform execution
  • Concurrent runs within Terraform using secured variables
  • State versions, audit logs, run history with triggers and notifications
  • Option for Sentinel to enforce governance policies as code

CTS is how changes can trigger automatic dynamic update of network infrastructure devices such as applying firewall policies, updating load balancer member pools, etc.

CTS v0.3 was announced Sep 2021

References:

Each task consists of a runbook automation written as a CTS compatible Terraform module using resources and data sources for the underlying network infrastructure. The consul-terraform-sync daemon runs on the same node as a Consul agent.


Alternative repo:

Consul Global Scale Benchmark

The biggest way to go is using https://github.com/hashicorp/consul-global-scale-benchmark used to prove that a Service Mesh Control Plane of 5 HashiCorp Consul Servers across 3 availability zones in us-east-1 are able to update 10,000 Consul/Nomad client nodes and 172,000+ services in under 1 second. Each Consul Server run on c5d.9xlarge instance types on EC2 having 36 vCPUs and 72 Gigabytes of memory. It’s described by White paper: “Service Mesh at Global Scale” and Podcast with creator: Anubhav Mishra (Office of the CTO).

See also: https://github.com/hashicorp/consul-global-scale-benchmark = Terraform configurations and helper scripts for Consul Global Scale Benchmark

Identify Terraform repo in GitHub

To create the app infra which Consul works on, consider the https://github.com/hashicorp-guides Consistent workflows to provision, secure, connect, and run any infrastructure for any application. * https://github.com/hashicorp-guides/hashistack

They reference 22 https://github.com/hashicorp-modules such as: * https://github.com/hashicorp-modules/network-aws

Each module has an examples folder.

https://www.terraform.io/language/settings/backends/remote Terraform Remote State back-ends

https://github.com/hashicorp/field-workshops-consul/tree/master/instruqt-tracks/secure-service-networking-for-aws

a. https://learn.hashicorp.com/tutorials/cloud/terraform-hcp-consul-provider - it provisions resources that qualify under the AWS free-tier.

Files:

  • consul.tf: describes the HPC Consul cluster you are going to create.
  • vpc_peering.tf: describes the AWS VPC and the peering with the HVN.
  • variables.tf: sets the variables for your deployment.

b. The following steps are based on https://learn.hashicorp.com/tutorials/cloud/consul-deploy referencing https://github.com/hashicorp/terraform-aws-hcp-consul which uses Terraform to do the below:

Among https://github.com/hashicorp/docker-consul = Official Docker images for Consul.

https://github.com/hashicorp/terraform-aws-hcp-consul is the Terraform module for connecting a HashiCorp Cloud Platform (HCP) Consul cluster to AWS. There are four examples containing default CIDRs for private and public subbnets:

  • existing-vpc
  • hcp-ec2-demo
  • hcp-ecs-demo
  • hcp-eks-demo

  • hcp-ec2-client - [For Testing Only]: installs Consul and runs Consul clients with EC2 virtual machines.
  • hcp-eks-client - [For Testing Only]: installs the Consul Helm chart on the provided Kubernetes cluster.
  • k8s-demo-app - [For Testing Only]: installs a demo application onto the Kubernetes cluster, using the Consul service mesh.

https://github.com/hashicorp/terraform-azurerm-hcp-consul

Hashicorp Cloud Account

  1. Sign into: https://cloud.hashicorp.com/products/consul

  2. Verify your email if it’s your first time, or type your email.
  3. The first time, select the Registration Name (such as “wilsonmar-org”), country to create a new org.
  4. You get $50! You can skip giving out your credit card until you want a PRODUCTION instance or use larger size node servers. For development use, an extra-small (XS) cluster size is deployed by default to handle up to 50 service instances.

  5. Select Consul on the left product menu. Bookmark the URL, which contains your account ID so you’ll go straight to it:

    https://portal.cloud.hashicorp.com/services/consul?project_id=…

  6. Click “Access control (IAM)” menu.
  7. Click “Service principals” from the menu and specify the 3 examples below (with your name) for each of 3 roles with ID such as wilsonmar-123456@12ae4567-f584-4f06-9a9e-240690e2088a

    • Role “Admin” (as full access to all resources including the right to edit IAM, invite users, edit roles)
    • Role “Contributor” (Can create and manage all types of resources but can’t grant access to others.)
    • Role “Viewer” (Can only view existing resources.)

    PROTIP: Once logged in, a cookie is saved in the browser so that you will be logged in again automatically.

  8. For each service principal, click the blue “Create service principal key”.

    HCP Secrets

  9. Click the copy icon to save each generated value to your Clipboard (for example):

    export HCP_CLIENT_ID=kdNNiD8IbU0FZH8juZ10CgkvE6OvLCZK
    export HCP_CLIENT_SECRET=6BHGXSErAzsPjdaimnERGDrG9DXBYTGhdBQQ8HuOJaykG9Jhw_bJgDqp35OkYSoA
    

    Alternately, copy-paste the values directly into provider config file:

    provider "hcp" {
      client_id     = "service-principal-key-client-id"
      client_secret = "service-principal-key-client-secret"
    }
    

    CAUTION: The secret is not shown after you leave the screen.

    Store secrets

  10. In a file encrypted and away from GitHub, store secrets:

    TODO: Use Vault to keep the above secrets secure (in a cloud).

    For now, create file config

    https://github.com/hashicorp/consul-guides = Example usage of HashiCorp Consul

    (optional) Configure kubectl

    aws eks --region $(terraform output -raw region) update-kubeconfig --name $(terraform output -raw local.eks_cluster_name)
    kubectl get pods -A
    

    Create a HashiCorp Virtual Network (HVN)

    REMEMBER: Each resource in HCP can only be located in one HVN. You cannot span two different HVNs with a single product deployment, and product deployments cannot be moved from one HVN to another. Additionally, HVNs cannot be changed after they are deployed.

    References:

    • https://registry.terraform.io/providers/hashicorp/hcp/latest/docs/resources/hvn

    Peer HVN to a AWS VPC

  11. In the HVN overview page, select the Peering connections tab, and click the Create peering connection link.

  12. Input the following information:

    • AWS account ID

    • VPC ID

    • VPC region

    • VPC CIDR (Classless Inter-Domain Routers) block

  13. Click the Create connection button to begin the peering process.

    Peering status begins at “Creating”.

  14. Accept the connection at the AWS console.

  15. Navigate to the Peering Connections area of your AWS Console.

    You should have an entry in the list with a status of Pending Acceptance.

  16. Click Actions -> Accept Request to confirm acceptance.

    Status should change to “active”.

  17. Once the HVN is deployed, the status updates to “Stable” on the HVN overview tab.

  18. You can return to this screen to delete the peering relationship. However, deleting this peering relationship means you will no longer be able to communicate with your HVN.

    Create a HCP Consul cluster

  19. Create Cluster (such as “consul-cluster-1”), Network ID (“hvn”), Region,

    CIDR Block 172.25.16.0/20 is the default CIDR block value.

    In HVN, IPv4 CIDR ranges used to automatically create resources in your cloud network are delegated in HVN. The CIDR range you use cannot overlap with the AWS VPC that you will be peering with later.

    Enable a public or private IP

    WARNING: A public IP makes the Consul UI and API conveniently available from anywhere in the public internet for development use. But it is not recommended for production because it is a less secure configuration.

    Consul Cluster list

    Configure L3 routing and security ???

  20. Configure L3 routing and security

  21. Create a security group

  22. Create a route

  23. Define ingress and egress rules

    https://learn.hashicorp.com/tutorials/cloud/terraform-hcp-consul-provider

    Configure Consul ACL Controller

    The Consul ACL Controller is added by Terraform code used to create other app VPC resources.

    TODO: Auto-discovery?

    Run Consul clients within the provisioned AWS VPC

  24. Connect your AWS VPCs to the HVN so that the clients in your VPC can communicate with the HCP server after the next step.

  25. Install Consul into those AWS VPC.

    This is not in Terraform code???

    Run a demo application on the chosen AWS runtime

    Consul Global Scale Benchmark

    Destroy Consul

  26. Destroy resources

    TODO:

    References about HVN (HashiCorp Virtual Network):

    • https://cloud.hashicorp.com/docs/hcp/network
    • https://learn.hashicorp.com/tutorials/cloud/consul-deploy
    • https://learn.hashicorp.com/tutorials/cloud/terraform-hcp-consul-provider#hcp_consul_base

Service Discovery Workflow

HCP Consul Cloud Pricing

https://registry.terraform.io/providers/hashicorp/hcp/latest/docs

https://cloud.hashicorp.com/products/consul/pricing

https://cloud.hashicorp.com/docs/consul#features

Plan Base + per svc instance hr Limits
Individual Development 0.027/hr
$20/mo
- Up to 50 service instances.
No uptime SLA.
"Standard" prod. $0.069/hr
$49/mo
Small: $0.02/hr SLA
"Plus" prod. $0.104/hr - SLA, multi-region

PROTIP: Assume a 5:1 node to services ratio.

https://www.hashicorp.com/products/consul/pricing


C. On a macOS laptop using Docker

  • https://learn.hashicorp.com/tutorials/consul/get-started-agent?in=consul/getting-started

Consul interactions

One Agent as Client or Server

PROTIP: The Consul executable binary is designed to run either as a local long-running client daemon or in server mode.

CAUTION: Do not use the manual approach of downloading release binaries from GitHub because

So that you avoid the toil the configuring PATH, etc. see install instructions below to use a package manager for each operating system (x86 and ARM):
* Homebrew (brew command) on macOS * apt-get on Linux * Chocolately (choco command) on Windows

Work with the Consul Agent using:

  • CLI (Command Line Interface) on Terminal sessions
  • API calls from within a custom program (written in Go, etc.)
  • GUI (Graphic User Interface) on an internet browser such as Google Chrome

The API at /connect/intentions/exact provides the most features to create Service Intentions.

REMEMBER: Normally, there is no reason to SSH directly into Consul servers.

The UI and API are intended to be consumed from remote systems, such as a user’s desktop or an application looking to discover a remote service in which it needs to establish connectivity. In addition,

Install HCDiag

  1. Install for macOS from Homebrew:

    brew install hcdiag
    ==> Downloading https://releases.hashicorp.com/hcdiag/0.2.0/hcdiag_0.2.0_darwin_amd64.zip
    ==> Installing hcdiag from hashicorp/tap
    ==> Caveats
    The darwin_arm64 architecture is not supported for this product
    at this time, however we do plan to support this in the future. The
    darwin_amd64 binary has been installed and may work in
    compatibility mode, but it is not fully supported.
    ==> Summary
    🍺  /opt/homebrew/Cellar/hcdiag/0.2.0: 5 files, 7.2MB, built in 2 seconds
    ==> Running `brew cleanup hcdiag`...
    Disable this behaviour by setting HOMEBREW_NO_INSTALL_CLEANUP.
    Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`).
    
  2. Verify installation by viewing the help:

    hcdiag -h
    Usage of hcdiag:
      -all
     	DEPRECATED: Run all available product diagnostics
      -config string
     	Path to HCL configuration file
      -consul
     	Run Consul diagnostics
      -dest string
     	Shorthand for -destination (default ".")
      -destination string
     	Path to the directory the bundle should be written in (default ".")
      -dryrun
     	Performing a dry run will display all commands without executing them
      -include-since 72h
     	Alias for -since, will be overridden if -since is also provided, usage examples: 72h, `25m`, `45s`, `120h1m90s` (default 72h0m0s)
      -includes value
     	files or directories to include (comma-separated, file-*-globbing available if 'wrapped-*-in-single-quotes')
     	e.g. '/var/log/consul-*,/var/log/nomad-*'
      -nomad
     	Run Nomad diagnostics
      -os string
     	Override operating system detection (default "auto")
      -serial
     	Run products in sequence rather than concurrently
      -since 72h
     	Collect information within this time. Takes a 'go-formatted' duration, usage examples: 72h, `25m`, `45s`, `120h1m90s` (default 72h0m0s)
      -terraform-ent
     	(Experimental) Run Terraform Enterprise diagnostics
      -vault
     	Run Vault diagnostics
      -version
     	Print the current version of hcdiag
    
  3. Before submitting a Service ticket to HashiCorp, obtain diagnostics run the HashiCorp utility (originating from ) while a HashiCorp server is running:

    hcdiag -dryrun
    [INFO]  hcdiag: Checking product availability
    [INFO]  hcdiag: Gathering diagnostics
    [INFO]  hcdiag: Running seekers for: product=host
    [INFO]  hcdiag: would run: seeker=stats
    
  4. Configure environment variables to provide the URL and tokens necessary, per this doc.

  5. Specify the parameter to specify data desired for each product:

    Warning: The hcdiag tool makes no attempt to obscure secrets or sensitive information. So inspect the bundle to ensure it contains only information that is appropriate to share.

    Install Consul Agent on Linux

    Accordingly:

    apt-get update
    # Install utilities curl, wget, jq, 
    apt-get -y install curl wget software-properties-common jq
    curl -fsSL https://apt.releases.hashicorp.com/gpg | apt-key add -
    # Get version:
    lsb_release -cs
    # Add the official HashiCorp Linux repository:
    apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com \
    $(lsb_release -cs) main" 
    # Install Consul Enterprise on the node:
    apt-get -y install consul-enterprise
    

Install Consul Agent on macOS

  1. To setup your mac for Consul, use the approach described in my blog:

    https://wilsonmar.github.io/mac-setup

  2. Notice there are two options to install the Consul Agent:

    brew search consul
    ==> Formulae
    consul                    hashicorp/tap/consul ✔             hashicorp/tap/consul-template
    consul-backinator         hashicorp/tap/consul-aws           hashicorp/tap/consul-terraform-sync
    consul-template           hashicorp/tap/consul-esm           hashicorp/tap/envconsul
    envconsul                 hashicorp/tap/consul-k8s           iconsur
     
    ==> Casks
    console
    
  3. Use your mouse to triple-click zsh in the command below to highlight the line, then press command+C to copy it to your Clipboard:

    zsh -c "$(curl -fsSL https://raw.githubusercontent.com/wilsonmar/mac-setup/main/mac-setup.zsh)" \
    -v -I -U -consul
    

    CAUTION: Do not click on the URL (starting with httpd) since the terminal program opens a browser to that URL.

    -v specifies optional verbose log output.

    -Golang specifies install of Go programming language development components

    -I specifies -Install of utilities XCode CLI, Homebrew, git, jq, tree, Docker, and components in the HashiCorp ecosystem, including Terraform, Vault, Nomad, envconsul.

    -U specifies -Update of utilities. Do not specify -I and -U after initial install (to save a few seconds).

    Utilities for working with AWS, Azure, GCP, and other clouds require their own parameter to be specified in order to be installed.

  4. Press command+Tab to switch to the Terminal.app.

  5. Click anywhere in the Terminal window and Press command+V to paste the command from your Clipboard.

  6. Press Return/Enter on your keyboard to begin execution.

    Install using Brew taps on MacOS

    In the script, the Consul Agent is installed using HashiCorp’s tap, as described at:

    • https://learn.hashicorp.com/tutorials/consul/get-started-install?in=consul/getting-started

    Instead of the usual:

    brew install consul

    or

    brew tap hashicorp/tap
    brew install hashicorp/tap/consul
    

    Notice the response caveats from brew install consul:

    The darwin_arm64 architecture is not supported for this product
    at this time, however we do plan to support this in the future. The
    darwin_amd64 binary has been installed and may work in
    compatibility mode, but it is not fully supported.
     
    To start hashicorp/tap/consul now and restart at login:
      brew services start hashicorp/tap/consul
    Or, if you don't want/need a background service you can just run:
      consul agent -dev -bind 127.0.0.1
    ==> Summary
    🍺  /opt/homebrew/Cellar/consul/1.12.0: 4 files, 117.1MB, built in 3 seconds
    

    -bind is the interface that Consul agent itself uses.

    -advertise is the interface that Consul agent asks others use to connect to it. Useful when the agent has multiple interfaces or the IP of a NAT device to reach through.

    Install by Download

    PROTIP: Download Enterprise binaries with name ending with “+ent” from Fastly servers at:
    https://releases.hashicorp.com/consul/

    File names containing “SHA256SUMS” are for verifying whether download was complete.

    Download “darwin_amd64” files for older Intel MacOS.
    Download “darwin_arm64” files for newer M1/M2 MacOS with Apple Silicon.

  7. Unzip
  8. Verify using check sum.
  9. Add to $PATH.

    Consul CLI commands

    Option A: Run Consul in background, which restarts automatically at login:

    brew services start hashicorp/tap/consul

    Option B: Run Consul in foreground, which occupies the Terminal and does not start again at login:

    consul agent -dev -bind 127.0.0.1 -node machine
    [DEBUG] agent.router.manager: Rebalanced servers, new active server: number_of_servers=1 active_server="wilsonmar-N2NYQJN46F (Addr: tcp/127.0.0.1:8300) (DC: dc1)"
    

    Alternately,

    consul agent -dev -datacenter="aws-1234567890" \
    -data-dir=/opt/consul  -encrypt="key" \
    -join="10.0.10.11,10.1.2.3" \
    -bind="127.0.0.1" -node machine

    -join will fail if the IP addresses (4 or 6) fails to start.

    PROTIP: In production, use configuration file to auto-join:

    {
      "bootstrap": false,
      "boostrap_expect": 3,
      "server": true,
      "retry_join": ["10.0.10.11,"10.1.2.3"]
    }
    
  10. TODO: Setup compatibility mode?

  11. Verify install:

    consul version

    Example reponse:

    Consul v1.12.0
    Revision 09a8cdb4
    Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)
    
  12. Obtain the menu of 31 command keywords:

    consul
    Usage: consul [--version] [--help] <command> [<args>]
     
    Available commands are:
     acl            Interact with Consul's ACLs
     agent          Runs a Consul agent
     catalog        Interact with the catalog
     config         Interact with Consul's Centralized Configurations
     connect        Interact with Consul Connect
     debug          Records a debugging archive for operators
     event          Fire a new event
     exec           Executes a command on Consul nodes
     force-leave    Forces a member of the cluster to enter the "left" state
     info           Provides debugging information for operators.
     intention      Interact with Connect service intentions
     join           Tell Consul agent to join cluster
     keygen         Generates a new encryption key
     keyring        Manages gossip layer encryption keys
     kv             Interact with the key-value store
     leave          Gracefully leaves the Consul cluster and shuts down
     lock           Execute a command holding a lock
     login          Login to Consul using an auth method
     logout         Destroy a Consul token created with login
     maint          Controls node or service maintenance mode
     members        Lists the members of a Consul cluster
     monitor        Stream logs from a Consul agent
     operator       Provides cluster-level tools for Consul operators
     reload         Triggers the agent to reload configuration files
     rtt            Estimates network round trip time between nodes
     services       Interact with services
     snapshot       Saves, restores and inspects snapshots of Consul server state
     tls            Builtin helpers for creating CAs and certificates
     validate       Validate config files/directories
     version        Prints the Consul version
     watch          Watch for changes in Consul
    

    Links have been added above.

CLI commands are used to start and stop the Consul Agent.

Ports used by Consul

The default ports, which some organizations change in hope of better security through obfuscation:

  • 8300 TCP for RPC (Remote Procedure Call) by all Consul server agents to handle incoming requests from other Consul agents to discover services and make Value requests for Consul KV

  • 8301 TCP/UDP for Serf LAN Gossip within the same region cluster for Consensus communication, for agreement on adding data to the data store, and replication of data
  • 8302 TCP/UDP for Serf WAN Gossip across regions

  • 8500 & 8501 TCP-only for localhost API and UI
  • 8502 TCP-only for Envoy sidecar proxy xDS gRPC API (disabled by default)
  • 8558 - Consul-Terraform-Sync daemon

  • 8600 TCP/UDP for DNS queries

  • 21000 - 21255 TCP (automatically assigned) for Sidecar proxy registrations

For bootstrapping and configuration of agent.hcl, see https://learn.hashicorp.com/tutorials/consul/access-control-setup-production


Environment Variables

The shell script I wrote makes use of several custom environment variables, which minimizes mistakes when several commands use the same values. When applicable, my script also captures values output from one step to use in subsequent commands, to avoid the toil and mistakes from manual copy and pasting.

Use of environment variables also enable the same command call to be made for both DEV and PROD use, further avoiding mistakes.

  • DATACENTER1_ID, which is obtained from my laptop’s $(hostname)

  • CONSUL_AGENT_TOKEN

  • ACL variables

envconsul

  • https://www.consul.io/docs/intro/vs
  • https://github.com/hashicorp/envconsul

The envconsul utility reads and sets environmental variables from data within the Consul Agent. It is installed when the Consul Agent is created.

  1. To launch a subprocess with environment variables using data from @hashicorp Consul and Vault.

    envconsul

Consul Templates

Consul-template is a separate binary which reads a template file to substitue variables defined between (“moustashe quotes” ) and replace each with values. An example:

[client]
host=
port=

user=
password=
# Lease: 

   

Start Consul Agent in forground

  1. Use a text editor to customize file /etc/consul.d in .ini format:

    [unit]
    Description=Consul
    Requires=network-online.target
    After=network-online.target
     
    [Service]
    Restart=on-failure
    ExecStart=/usr/local/bin/consul agent -config-dir="/etc/consul.d"
    User=consul
    
  2. If your Consul Agent is running locally:

    consul agent -dev -node "$(hostname)" -config-dir="/etc/consul.d"

    -node “$(hostname)” is specified for macOS users: Consul uses your hostname as the default node name. If your hostname contains periods, DNS queries to that node will not work with Consul. To avoid this, explicitly set the name of your node with an environment variable.

    Start Consul Server in background (macOS)

    Alternately, referencing the environment created:

    Because HashiCorp’s Homebrew tap was used to install:

    brew services start hashicorp/tap/consul

    Alternately, on Linux:

    /bin/start_consul.sh

    Sample response:

    Starting HashiCorp Consul in Server Mode...
    CMD: nohup consul agent -config-dir=/consul/config/ > /consul.out &
    Log output will appear in consul.out...
    nohup: redirecting stderr to stdout
    Consul server startup complete.
    

  3. Start Consul Server:

    systemctl start consul

    No message is returned unless there is an error.

    Leave (Stop) Consul gracefully

    CAUTION: When operating as a server, a graceful leave is important to avoid causing a potential availability outage affecting the consensus protocol.

  4. Gracefully stop the Consul by making it leave the Consul datacenter and shut down:

    consul leave

    QUESTION: No need to specify the node (like in start) because Gossip is supposed to propagate updated membership state across the cluster. That’s “Discovery” at work.

    CAUTION: Leaving a server affects the Raft peer-set, which results in auto-reconfiguration of the cluster to have fewer servers.

    The command notifies other members that the agent left the datacenter. When an agent leaves, its local services running on the same node and their checks are removed from the catalog and Consul doesn’t try to contact with that node again.

    Log entries in a sample response (without date/time stamps):

    [INFO]  agent.server: server starting leave
    [INFO]  agent.server.serf.wan: serf: EventMemberLeave: wilsonmar-N2NYQJN46F.dc1 127.0.0.1
    [INFO]  agent.server: Handled event for server in area: event=member-leave server=wilsonmar-N2NYQJN46F.dc1 area=wan
    [INFO]  agent.router.manager: shutting down
    [INFO]  agent.server.serf.lan: serf: EventMemberLeave: wilsonmar-N2NYQJN46F 127.0.0.1
    [INFO]  agent.server: Removing LAN server: server="wilsonmar-N2NYQJN46F (Addr: tcp/127.0.0.1:8300) (DC: dc1)"
    [WARN]  agent.server: deregistering self should be done by follower: name=wilsonmar-N2NYQJN46F partition=default
    [DEBUG] agent.server.autopilot: will not remove server as a removal of a majority of servers is not safe: id=40fee474-cf41-1063-2790-c8ff2b14d4af
    [INFO]  agent.server: Waiting to drain RPC traffic: drain_time=5s
    [INFO]  agent: Requesting shutdown
    [INFO]  agent.server: shutting down server
    [DEBUG] agent.server.usage_metrics: usage metrics reporter shutting down
    [INFO]  agent.leader: stopping routine: routine="federation state anti-entropy"
    [INFO]  agent.leader: stopping routine: routine="federation state pruning"
    [INFO]  agent.leader: stopping routine: routine="intermediate cert renew watch"
    [INFO]  agent.leader: stopping routine: routine="CA root pruning"
    [INFO]  agent.leader: stopping routine: routine="CA root expiration metric"
    [INFO]  agent.leader: stopping routine: routine="CA signing expiration metric"
    [INFO]  agent.leader: stopped routine: routine="intermediate cert renew watch"
    [INFO]  agent.leader: stopped routine: routine="CA root expiration metric"
    [INFO]  agent.leader: stopped routine: routine="CA signing expiration metric"
    [ERROR] agent.server: error performing anti-entropy sync of federation state: error="context canceled"
    [INFO]  agent.leader: stopped routine: routine="federation state anti-entropy"
    [DEBUG] agent.server.autopilot: state update routine is now stopped
    [INFO]  agent.leader: stopped routine: routine="CA root pruning"
    [DEBUG] agent.server.autopilot: autopilot is now stopped
    [INFO]  agent.leader: stopping routine: routine="federation state pruning"
    [INFO]  agent.leader: stopped routine: routine="federation state pruning"
    [INFO]  agent.server.autopilot: reconciliation now disabled
    [INFO]  agent.router.manager: shutting down
    [INFO]  agent: consul server down
    [INFO]  agent: shutdown complete
    [DEBUG] agent.http: Request finished: method=PUT url=/v1/agent/leave from=127.0.0.1:62886 latency=11.017448542s
    [INFO]  agent: Stopping server: protocol=DNS address=127.0.0.1:8600 network=tcp
    [INFO]  agent: Stopping server: protocol=DNS address=127.0.0.1:8600 network=udp
    [INFO]  agent: Stopping server: address=127.0.0.1:8500 network=tcp protocol=http
    [INFO]  agent: Waiting for endpoints to shut down
    [INFO]  agent: Endpoints down
    [INFO]  agent: Exit code: code=0
    

    Consul automatically tries to reconnect to a failed node, assuming that it may be unavailable because of a network partition, and that it may be coming back.


Consul web GUI

  1. When the Consul server is invoked:

    open "http://localhost:8080/ui/${DATACENTER1_ID}/services"

    Consul GUI

    The Consul GUI provides a mouse-clickable way for you to conviently work with these (explained below):

    • Services (in the Service Catalog)

    • Nodes is the number of Consul instances

    • Key/Value datastore of IP address generated

    • ACL (Access Control List) entries which block or allow network access based on port number

    • Intentions to allow or deny connections between specific services by name (instead of IP addresses) in the Service Graph

API

  1. Custom programs (written in Go, etc.) can communication with Consul using HTTP API calls defined in:

    https://www.consul.io/api

  2. To list nodes in JSON using API:

    curl "http://localhost:8500/v1/catalog/nodes"
    [
      {
     "ID": "019063f6-9215-6f2c-c930-9e84600029da",
     "Node": "Judiths-MBP",
     "Address": "127.0.0.1",
     "Datacenter": "dc1",
     "TaggedAddresses": {
       "lan": "127.0.0.1",
       "wan": "127.0.0.1"
     },
     "Meta": {
       "consul-network-segment": ""
     },
     "CreateIndex": 9,
     "ModifyIndex": 10
      }
    ]
    

TODO: DNS -consul specifies installation of HashiCorp Consul agent.

Prepared Queries

  • https://www.consul.io/api-docs/query

BLAH: This feature is only available when using API calls (not CLI).

More complex DNS queries can be made using API calls than limiting entry points exposed by DNS.

To get a set of healthy nodes which provide a given service:

  1. Edit a prepared query template file in this format:

    {
      "Template": {
     "Type": "name_prefix_match",
     "Regexp": "^geo-db-(.*?)-([^\\-]+?)$",
     "RemoveEmptyTags": false
      }
    }
    

    Automate Geo-Failover with Prepared Queries:

  2. Register a query template (named, for example “banking-app”) using in-line:

    curl "${CONSUL_URL_WITH_PORT_VER}/query" \
     --request POST \
     --data @- << EOF
    {
      "Name": "banking-app",
      "Service": {
     "Service": "banking-app",
     "Tags": ["v1.2.3"],
     "Failover": {
       "Datacenters": ["dc2", "dc3"]
     }
      }
    }
    EOF
    

    Alternately, instead of EOF, create a file:

    CONSUL_QUERY_FILENAME="payload.json"
    
  3. Make the request by providing a valid Token:

    curl --request PUT \
     --data "@${CONSUL_QUERY_FILENAME}" \
     "${CONSUL_URL_WITH_PORT_VER}/query/${CONSUL_AGENT_TOKEN}"
    

Queries are also used for ACL

Query execution is subject to node/node_prefix and service/service_prefix policies.


Chaos Engineering

Practicing use of the above should be part of your pre-production Chaos Engineering/Incident Management process.

Failure modes:

  1. Failure of single app node (Consul should notice and send alert)

  2. Failure of a Consul Non-Voting server (if setup for performance)
  3. Failure of a Consul Follower server (triggers replacement)
  4. Failure of the Consul Leader server (triggering an election)

  5. Failure of an entire Consul cluster Availability Zone
  6. Failure of an entire Consul cluster Region

Degraded modes:

  1. Under-performing app node

  2. Under-performing Consul Leader server
  3. Under-performing Consul Follower server
  4. Under-performing Consul Non-voting server

  5. Under-performing transfer between Consul Availability Zones
  6. Under-performing WAN Gossip protocol transfer between Consul Regions

Down for maintenance

  1. To bring a node offline, enable maintenace mode:

    consul maint -enable -server redis -reason "Server patching"
    

    This action is logged, which should trigger an alert to the SOC.

  2. To bring a node back online, disable maintenace mode:

    consul maint -disable -server redis
    

Backup Consul data to Snapshots

Consul keeps its data in memory (rather than in a database on a hard drive).

So data in a Consul agent has to be captured in complete point-in-time snapshots (gzipped tar file) of Consul’s committed state. Other data also in the Snapshot include:

  1. Specify the ACL Token (such as “12345678-1234-abcd-5678-1234567890ab”) (also used for UI login):

    export CONSUL_HTTP_TOKEN="${CONSUL_ACL_TOKEN}"
    
  2. PROTIP: Name files with a timestamp in UTC time zone, such as 2022-05-16T03:10:15.386UTC.tgz

    brew install coreutils
    CONSUL_BACKUP_FILENAME="$( gdate -u +'%Y-%m-%dT%H:%M:%S.%3N%Z' ).tgz"
    

    Snapshots are typically performed on the LEADER node, but when the Cluster has no Leader, a FOLLOWER can take it if the --stale flag is specified.

  3. Create the snapshot manually using the CLI, API,

    consul snapshot save "${CONSUL_BACKUP_FILENAME}"
    
    curl --header "X-Consul-Token: "${CONSUL_ACL_TOKEN}" \
    "${CONSUL_URL_WITH_PORT_VER}/snapshot  -o ${CONSUL_BACKUP_FILENAME}"
    
  4. View snapshots available on the local filesystem:

    consul snapshot inspect
  5. PROTIP: It’s more secure to transfer snapshots offsite, held under an account separate from day-to-day operations.

    • Amazon S3
    • Azure Blob Storage
    • Google Cloud Storage

    For example, define an S3 bucket. PROTIP: Use different cloud service account to write and another to receive snapshots.

    Enterprise Snapshot Agent

    Enterprise-licensed users can run the Consul Snapshot Agent Service to automatically collect agents periodically.

  6. Ensure that an enterprise license is configured.

  7. Define the configuration file, such as this sample consul-snapshot.d file to take a snapshot every 30 minutes:

    {
      "snapshot_agent": {
      "http_addr": "127.0.0.1:8500",
      "token": "12345678-1234-abcd-5678-1234567890ab",
      "datacenter": "dc1",
      "snapshot": {
         "interval": "30m",
         "retain": 336,
         "deregister_after": "8h"
      },
    "aws_storage": {
       "s3_region": "us-east-1",
       "s3_bucket": "my-consul-snapshots-bucket"   
     }
      }
    }
    

    In PRODUCTION, ACLs are enabled, so token need to be generated and included in the file.

    336 snapshots are retained, with the oldest automatically discarded.

    De-register the service if it’s dead over 8 hours.

  8. Run:

    consul snapshot agent -config-dir=/etc/consul-snapshot.d
    

    Registration is done automatically.

    https://www.consul.io/commands/snapshot/agent

    Service file

    A systemd agent configuration file in Linux, such as:

    /etc/systemd/system/snapshot.service

    [unit]
    Description="HashiCorp Consul Snapshot Agent"
    Documentation=https://www.consul.io/
    Requires=network-online.target
    After=consul.service
    ConditionFileNotEmpty=/etc/snapshot.d/shapshot.json
     
    [Service]
    User=consul
    Group=consul
    ExecStart=/usr/local/bin/consul snapshot agent -config-dir=/etc/snapshot.d/
    KillMode=process
    Restart=on-failure
    LimitNOFILE=65535
     
    [Install]
    WantedBy=multi-user.target
    
    • https://unix.stackexchange.com/questions/506347/why-do-most-systemd-examples-contain-wantedby-multi-user-target

    Restore from Snapshot

    Snapshots are intended for full Disaster Recovery, not for selective restore back to a specific point in the past (like GitHub can do).

  9. To restore to a fresh set of Consul servers.

    consul snapshot restore

    CAUTION: A Consul server stops processing while performing a restore. You don’t want it working anyway. 1

    Alternately, using API:

    curl --header "X-Consul-Token: "${CONSUL_ACL_TOKEN}" \
    --request PUT \
    --data-binary "@${CONSUL_BACKUP_FILENAME}" \
    "${CONSUL_URL_WITH_PORT_VER}/snapshot
    

    PROTIP: There is no selective restore of data.

  10. After each configuration change, make a backup copy of the file seed (version) file to establish quorum, at:

    raft/peers.json

    That file contains information needed for manual Recovery:

    [
      {
     "id": "12345678-1234-abcd-5678-1234567890ab",
     "address": "10.1.0.1:8300",
     "non-voter": false
      }
      ...
    ]
    

    See https://learning.oreilly.com/library/view/consul-up-and/9781098106133/ch02.html#building-consensus-raft

    PROTIP: As per CAP Theorem, Raft emphasizes Consistency (every read receives the most recent write value) over Availability.


Service Graph Intentions

The Consul GUI enables search for connections by name (instead of IP addresses) as well as specifying connections between specific services by name (instead of IP addresses):

Consul Intentions GUI

PROTIP: Working with service names using a GUI not only reduces hassle but also minimizes mistakes, which have dire Security consequences.

  1. On the CLI, Deny the web server from talking to anything:

    consul intention create -deny web '*'
    
  2. On the CLI, Allow the web server to talk to db (the database):

    consul intention create -allow web db
    

    Rules are set on the service itself, not on where they are implemented.

Services

  • https://www.consul.io/docs/discovery/services

Consul discovers services which are setup to be discovered with a file on the service machine.

  1. Edit the file:

    {
      "service": {
      "id": "unique-server-01",
      "name": "retail-web-1234567890",
      "token": "12345678-1234-abcd-5678-1234567890ab",
      "tags": ["v1.02","production"],
      "address": "10.1.2.2",
      "port": 80,
      "checks": [ {
         "args": ["/usr/local/bin/check_mem.py"],
         "interval": "30s"
      } ],
      }
    }
    

    A check is needed for memory (“mem”) because it’s internal to the app’s process.

    https://www.consul.io/docs/discovery/checks

  2. Construct the file CONSUL_SVC_REGIS_FILE such as /etc/consul.d/redis.json (or hcl):

    {
      "service": {
      "name": "retail-web",
      "token": "12345678-1234-abcd-5678-1234567890ab",
      "port": 80,
      "check":  {
         "id": "http",
         "name": "web check",
         "tcp": "localhost:80",
         "interval": "5s",
         "timeout": "3s
      }
      }
    }
    
  3. A service instance is defined by a service name + service ID.

    QUESTION: “web check”?

  4. PROTIP: Provide Consul read permissions on the directory/file used above as a variable so the same CLI can be used in dev & prod (for less mistakes):

    CONSUL_SVC_REGIS_FILE="redis.hcl"
  5. Define the Consul Registration Service:

    CONSUL_SVC_REGIS_FRONT="http://localhost:8500"

    Alternately, in production (for example):

    CONSUL_SVC_REGIS_FRONT="https://consul.example.com:8500}"
    
  6. Register the service:

    consul services register redis.hcl

    Alternately, make an API call specifying -config-file name:

    curl -X PUT --data "@${CONSUL_SVC_REGIS_FILE}" \
    "${CONSUL_SVC_REGIS_FRONT}/v1/agent/service/register
    
  7. Consul does not watch that file after loading, so changes to it after load must be reloaded using:

    sysctl consul reload
  8. “Service discovery” finds available service instance addresses and ports.

  9. TODO: Define default connection limits, for better security.

  10. Consul API Gateway =

    • https://www.youtube.com/watch?v=JtVDliGL3mE Video for Consul API Gateway with Jeff Apple, PM of API Gateway
    • https://www.hashicorp.com/blog/announcing-hashicorp-consul-api-gateway
    • https://learn.hashicorp.com/tutorials/consul/kubernetes-api-gateway?in=consul/developer-mesh
    • https://www.hashicorp.com/blog/consul-api-gateway-now-generally-available Feb 24 2022

  11. QUESTION: Linux Security Model integrated into operating system, such as AppArmor, SELinux, Seccomp.

    See https://www.consul.io/docs/security/security-models/core

  12. Consul load balances across instances.

  13. Define memory variable:

    CONSUL_CONFIG_KIND="extra-config"
  14. Define a CONSUL_CONFIG_FILE

    config_entries {
      bootstrap {
      kind = "proxy-defaults"
      name = "global"
      config {
         local_connect_timeout_ms = 1000
         handshake_timeout_ms = 1000
      }
      }
    }
    bootstrap {
    kind = "service-defaults"
    name = "web"
    namespace = "default"
    protocol = "http"
    }
    
  15. consul config write “${CONSUL_CONFIG_FILE}”

  16. Read back

    consul config read -kind proxy-defaults -name web

  17. “Discover” nodes using DNS interface dig command to the Consul agent’s DNS server, which runs on port 8600 by default:

    REMEMBER: Only healthy instances are returned.

    If running within Docker image “hashicorp/counting-service:0.0.2”

    dig @127.0.0.1 -p 8600 "counting.service.consul"

    Alternately, discover apps using dig appb.service.consul

    If running locally:

    dig @127.0.0.1 -p 8600 "$(hostname).node.consul"
    ; <<>> DiG 9.10.6 <<>> @127.0.0.1 -p 8600 wilsonmar-N2NYQJN46F.node.consul
    ; (1 server found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16775
    ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 2
    ;; WARNING: recursion requested but not available
     
    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 4096
    ;; QUESTION SECTION:
    ;wilsonmar-N2NYQJN46F.node.consul. IN	A
     
    ;; ANSWER SECTION:
    wilsonmar-N2NYQJN46F.node.consul. 0 IN	A	127.0.0.1
     
    ;; ADDITIONAL SECTION:
    wilsonmar-N2NYQJN46F.node.consul. 0 IN	TXT	"consul-network-segment="
     
    ;; Query time: 2 msec
    ;; SERVER: 127.0.0.1#8600(127.0.0.1)
    ;; WHEN: Sun May 08 22:35:21 MDT 2022
    ;; MSG SIZE  rcvd: 113
    

    QUESTION: SRV lookups

  18. Connect

    NOTE: Unhealthy nodes are filtered out.

    TODO: This approach enables automatic load balancing. Decentralizes DNS.

(Consul) Nodes (Health Checks)

Red x’s identify Consul nodes which failed health checks.

Moreover, Consul servers Gossip with each other about state changes.

Consul can use several techniques to obtain health info: Docker, gRPC, TCP, TTL heartbeats, and Nagios-compatible scripts.

  1. To perform a health check manually using an API call:

    curl http://127.0.0.1:8500/v1/health/checks/my-service

    Parse the JSON response:

    [
      {
     "Node": "foobar",
     "CheckID": "service:redis",
     "Name": "Service 'redis' check",
     "Status": "passing",
     "Notes": "",
     "Output": "",
     "ServiceID": "redis",
     "ServiceName": "redis",
       "ServiceTags": ["primary"]   
      }
    ]
    

Consul External Services Monitor (ESM)

  • https://github.com/hashicorp/consul-esm
  • https://learn.hashicorp.com/tutorials/consul/service-registration-external-services

When a local Consul agent cannot be installed locally, such as in cloud-managed services or incompatible hardware, to keep Consul’s service catalog up to date, periodically poll those services by installing the Consul ESM on ___. Such a health check is added to service registration like this:

token "12345678-1234-abcd-5678-1234567890ab",
  check {
    id = "some-check"
    http = "http://localhost:9002/health",
    method = "GET",
    interval = "1s",
    timeout = "1s"
  }
   

ACL (Access Control List) Operations

  • https://www.udemy.com/course/hashicorp-consul/learn/lecture/24724816#questions/17665170/

ACLs define access granted through specific ports through firewalls (on Enterprise network traffic in “L3” segments).

ACLs are used to:

  • Add & Remove nodes to the datacenter
  • Add & Remove services
  • Discover services
  • Consul KV (CRUD) transactions
  • API/CLI operations to interact with the datacenter
  • Block Catalog Access

Vault works the same way as this: An ACL Token encapsulates multiple policies, with each policy aggregating one or more rules.

SECURITY PROTIP: To reduce the “blast radius”, create a rules.hcl file for each node. For each node, specifically name the node within each node’s rules.hcl file.

TODO: Use a templating utility to create a rules.hcl file containing a different node name for each node.

  1. Environment Variable names I use in scripts involving ACL:

    ACL_POLICY_FILE_NAME=”some-service-policy.hcl”
    ACL_POLICY_NAME=”some-service-policy
    ACL_POLICY_DESC=”Token”

  2. Create the file defined in ACL_POLICY_FILE_NAME:

    # Policy A
    service "web" {
    policy = "read"
    }
    key-prefix "foo-path/" {
    policy = "write"
    }
    
    # Policy B
    service "db" {
    policy = "deny"
    }
    node "" {
    policy = "read"
    }
    

    Policy dispositions in rules include “read”, “write”, “read”, “list”.

    TODO: To define according to “Least Privilege” principles, provide “remove” permissions to a separate account than the account which performs “add”.

  3. Initiate the policy using the policy file:

    consul acl policy create -name "${ACL_POLICY_NAME}" \
    -rules @"${ACL_POLICY_FILE_NAME}"
    
  4. Create the Token GUID from the policy created:

    ACL_TOKEN=$( consul acl token create -description "${ACL_POLICY_DESC}" \
    -policy-name @"${ACL_POLICY_NAME}" )
    
  5. Add ACL_TOKEN value

    service {
      name = "dashboard",
      port = 9002,
      token = "12345678-1234-abcd-5678-1234567890ab",
    }
    

D. In a single datacenter (with Kubernetes)

In HashiCorp’s YouTube channel covering all their 8 products:

Rosemary Wang (joatmon08.github.io, Developer Advocate) with J. Cole Morrison hold fun hashicorplive Twitch parties [about two hours each] to show how to learn Consul “the hard way” by setting it up from scratch, using code from github.com/jcolemorrison/getting-into-consul

Consul

Consul offers three types of Gateways in the data path to validate authenticity and traffic flows to enforce intentions between services: Enterprise Academy:

( https://play.instruqt.com/hashicorp/tracks/vault-advanced-data-protection-with-transform)

Kubernetes with Consul

Kubernetes with Service Mesh and Consul

This Consul Enterprise feature is called the “Consul Connect”. VIDEO

Envoy install

To ensure a specific version tested with the tutorial, instead of using brew install func-e envoy:

  1. Install Envoy proxy (specifically version 1.20.1) using https://func-e.io/:

    curl https://func-e.io/install.sh | bash -s -- -b /usr/local/bin
    
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                  Dload  Upload   Total   Spent    Left  Speed
    100  9791  100  9791    0     0  17341      0 --:--:-- --:--:-- --:--:-- 17421
    tetratelabs/func-e info checking GitHub for latest tag
    tetratelabs/func-e info found version: 1.1.3 for v1.1.3/darwin/arm64
    tetratelabs/func-e info installed /usr/local/bin/func-e
    

    If using ARM:

    export FUNC_E_PLATFORM=darwin/amd64
    func-e use 1.20.1
    
    
    
    downloading https://archive.tetratelabs.io/envoy/download/v1.20.1/envoy-v1.20.1-darwin-amd64.tar.xz
    
  2. Move Envoy from the .func-e folder to a path common in $PATH:

    sudo mv ~/.func-e/versions/1.20.1/bin/envoy  /usr/local/bin/
    
  3. Verify if can be found in PATH:

    envoy --version
    envoy  version: ea23f47b27464794980c05ab290a3b73d801405e/1.20.1/Modified/RELEASE/BoringSSL
    

    NOTE: brew install envoy installs version 1.22.2 (at time of writing).

Recordings

A series of recordings live on Twitch.tv by Developer Evangelists Rosemary Wang and J. Cole Morrison:


Sidecar proxy injection

Consul comes with a Sidecar proxy, but also supports the Kubernetes Envoy proxy (from Lyft). (QUESTION: This means that migration to Consul can occur gradually?)

You can use Helm but consul-k8s CLI is now the recommended way because it validates your environment and gives you much better error messages and helps with a clean installation

  1. To register (inject) Consul as a Sidecar proxy, add this annotation in a Helm chart:

    apiVersion: v1
    kind: Pod
    metadata:
      name: cats
      annotations:
     "consul.hashicorp.com/connect-inject": "true"
    spec:
      containers:
             - name: cats
     image: grove-mountain/cats:1.0.1
     ports:
     - containerPort: 8000
       name: http
    
  2. Yaml file:

    • helm-consul-values.yaml changes the default settings to give a name to the datacenter, specify the number of replicas, and enable Injection
    • consul-helm
    • counting.yaml
    • dashboard.yaml

  3. As instructed, install Helm:

    brew install helm
  4. Ensure you have access to the Consul Helm chart and you see the latest chart version listed. If you have previously added the HashiCorp Helm repository, run helm repo update.

    helm repo add hashicorp https://helm.releases.hashicorp.com
    helm search repo hashicorp/consul
    NAME                CHART VERSION   APP VERSION DESCRIPTION
     hashicorp/consul    0.35.0          1.10.3      Official HashiCorp Consul Chart
    
  5. Install Consul with the default configuration which if not already present, creates a Consul Kubernetes namespace and install Consul on the dedicated namespace:

    helm install consul hashicorp/consul --set global.name=consul --create-namespace -n consul

    NAME: consul

    Alternately:

    helm install consul -f helm-consul-values.yaml ./consul-helm
    
  6. On a new Terminal window:

    kubectl port-forward svc/consul-tonsul-ui 8080:80
    Forwarding from 127.0.0.1:8080 -> 8500
    Forwarding from [::1]:8080 -> 8500
    
  7. Register with Consul agent (which doesn’t start the Sidecar proxy):

    {
    "service": {
       "name": "front-end-sidecar",
       "port": "8080",
       "connect": {
          "sidecar_service": {}
       }
    }
    }
    
  8. Registering a Service Proxy:

    {
    "service": {
       "id": "someweb-01",
       "name": "front-end-sidecar",
       "tags": ["v1.02","production"],
       "address": "",
       "port": 80,
       "checks": [ {
          "sidecar_service": {
             "proxy": {
                "upstreams": [{
                   "destination_name": "db01"
                }
             ]
          }
       }
    }
    }
    

    CAUTION: Even though it’s a “name”, its value is used to match to register the service.

    https://www.udemy.com/course/hashicorp-consul/learn/lecture/24649144#questions

  9. Start the Sidecar proxy process.

    ???

  10. View the Consul dashboard:

    http://localhost:8080/ul/datacenter/services

    References about Kubernetes with Consul:


Service Discovery Registry DNS Queries

LEARN: In enviornment where Infosec limit DNS traffic to the default UDP port 53, we setup dnsmasq or BIND forwarding from port 53 to 8600 because we don’t want to use root privileges requiredd to use ports below 1024.

Consul Service Registry process

Consul servers maintain a DNS “Services Registry”

  1. Each service (such as Redis cache in this example) is registered:

    service {
      name = "web",
      port = 9090,
      token = "12345678-1234-abcd-5678-1234567890ab",
      connect:{
     sidecar_service {
       port = 20000
       proxy {
          upstreams {
             destination_name = "payments"
             local_bind_address = "127.0.0.1"
             local_bind_port = 9091
          }
       }
     }
      }
    }
    
    • Proxy Defaults to control proxy configuration
    • Service Defaults configures defaults for all instances of a service

    Discovery: Service Router -> Service Spliter -> Service Router

    • Service Router defines where to send Layer 7 traffic
    • Service Splitter defines how to divide traffic for a single HTTP route
    • Service Resolve matches service instances with Consul upstreams

    PROTIP: Include a health check stanza in the service registration, such as:

    service {
      ...
      "check": {
      "id": "mem-util",
      "name": "Memory utilitization",
      "script": "/usr/local/bin/check_mem.py",
      "interval": "10s"
      }
    }
    

    Once registered, a service should appear as available within the Consul service registry.

Centralized ???

Consul External Services Monitor (ESM)

When a local Consul agent cannot be installed locally, such as in cloud-managed services or incompatible hardware, to keep Consul’s service catalog up to date, periodically poll those services by installing the Consul ESM on ___.

Such a health check added to service registration:

token "12345678-1234-abcd-5678-1234567890ab",
  check {
    id = ""
  }
   
  1. Discover DNS SRV record

    • https://www.wikiwand.com/en/SRV_record

    curl \ http://localhost:8500/v1/catalog/services/redis

    PROTIP: Consul cleints return only healthy nodes and services because it maintains the health status.

  2. Each local Consul caches lookups for 3 days.

    Each entry can be tagged, such as

    tag.service.service.datacenter.domain

    tag.service.service.datacenter.${DNS_TLD}

    db.redis.service.dc1.consul

    PROTIP: Consul is the #1 discovery tool with AWS Route53 (via delegation from resolver)

    Traditional DNS services ( bind, iptables, dnsmasq ) can be configured to forward requests with the DNS_TLD suffix (“consul”):

  • NOTE: Consul can also received forwarded DNS requests from in below:

    server=/consul/127.0.0.1#8600
  • To configure bind server

    zone "consul" IN{
     type forward
     forward only
     forwarders { 127.0.0.1 port 8600 }
    }
     
  • To configure iptables in Linux servers:

    iptables -t nat -A PREROUTING  -p tcp -m tcp --dport
    iptables -t nat -A PREROUTING  -p udp -m upd --dport
    iptables -t nat -A OUTPUT -d localhost -d tcp -m
    iptables -t nat -A OUTPUT -d localhost -d upd -m
     

    The response is 53 -j REDIRECT --to ports 8600

    References about templating/generating JSON & YAML:

    • https://learnk8s.io/templating-yaml-with-code
    • Jsonnet
    • https://golangexample.com/a-tool-to-apply-variables-from-cli-env-json-toml-yaml-files-to-templates/
    • https://github.com/krakozaure/tmpl?ref=golangexample.com
    • https://wryun.github.io/rjsone/

Consul workflows beyond Kubernetes

  • Service Discovery: (kube-dns, kube-proxy) to identify and connect any service on any cloud or runtime. with Consul DNS

  • Service Configuration: (K8s Configmaps) but Consul also updates F5 and other load balancer rules, for dynamic configuration across distributed services (in milliseconds)

  • Segmentation: (Network Policy + Controller), providing network infrastructure automation

Service Discovery With Consul on Kubernetes

Service Mesh

Multi-service Service Mesh: secure service-to-service traffic with Mutual TLS certificates, plus enable progressive application delivery practices. - Application networking and security with identity-based authorization - L7 traffic management - Service-to-service encryption - Health checking to automatically remove services that fail health checks

Consul Enterprise Academy: Service Mesh

Deploying a Service Mesh at Enterprise Scale With Consul - HashiConf Global 2021

Beyond:

  • Access Control
  • Billing
  • Networking
  • Identity
  • Resource Management


Mutual TLS

  • https://www.consul.io/docs/security/encryption#rpc-encryption-with-tls
  • https://www.udemy.com/course/hashicorp-consul/learn/lecture/24723260#questions

To encrypt traffic between nodes, each asset is given an encrypted identity in the form of a TLS certificate (in X.509, SPIFFE-compatible format). Consul also provides a Proxy to enforce communications between nodes using “Mutual TLS” where each party exchange certificates with each other.

Consul’s auto-join provider enables nodes running outside of Kubernetes to join a Consul cluster running on Kubernetes API.

Consul can auto-inject certifictes into Kubernetes Envoy Sidecars to secure communication traffic (within the Service Mesh).

RECOMMENDED: Have Consul use HashiCorp Vault to generate dynamic x.509 certificates.

Consul Connect (Service Mesh)

Integration between Consul and Kubernetes is achieved by running Consul Service Mesh (aka Consul Connect) on Kubernetes:

Catalog Sync: Sync Consul services into first-class Kubernetes services and vice versa. This enables Kubernetes to easily access external services and for non-Kubernetes nodes to easily discover and access Kubernetes services.

  1. Have Vault act as the Certificate Authority (CA) for Consul Connect. On an already configured Vault, enable:

    vault secrets enable pki
    vault secrets enable consul
    
  2. A sample Consul configuration to use Vault for Connect:

    connect {
    enabled = true
    ca_provider = "vault"
      ca_config {
         address = "https://vault.example.com:8200"
         token = "s.1234567890abcdef12"
         root_pki_path = "connect_root"
         intermediate_pki_path = "connect_inter"
         leaf_cert_ttl = "24h"
         rotation_period = "2160h"
         intermediate_cert_ttl = "8760h"
         private_key_type = "rsa"
         private_key_bits = 2048
      }
    }
    
  3. Configure access to Consul to create tokens (using the admin token):

    vault write consul/config/access \
    address=https://consul:8200 \
    token=12345678-1234-abcd-5678-1234567890ab
    
  4. Create a role for each permission set:

    vault write consul/roles/my-role policies=readonly
    
  5. Generate credentials (lease-id, lease_duration 768h, lease_renewable true, token):

    vault read consul/creds/my-role
  6. For each access, human users generate a new ACL token from Vault.


Assist or Replaces Kubernetes

  • https://learn.hashicorp.com/tutorials/nomad/consul-service-mesh ^ https://www.consul.io/docs/k8s/installation/install

Consul combines with Nomad, Vault, and Terraform to provide a full alternative to Kubernetes for Docker container orchestration:

Hashicorp replace Kubernetes

Nomad, by itself, is a cluster manager and task scheduler.

Nomad, like Kubernetes, orchestrates Docker containers. But Nomad also orchestrates non-containerized apps. Nomad demonstrated its scalability in the Nomad’s “C2M Challenge”, which shows it versatile and lightweight to support over 2,000,000 tasks.

The smallest units of deployment in Nomad are called “Tasks” – the equivalent to “Pods” in Kubernetes.

Kubernetes (as of publishing date) claims to support clusters up to 5,000 nodes, with 300,000 total containers, and no more than 150,000 pods.

Nomad, originally launched in 2015, as part of Cloudflare’s development environment [transcript] – a company which routes 10% of the world’s internet traffic) and a cornerstone of Roblox’s and Pandora’s scaling.

Nomad may not be as commonly used as Kubernetes, but it already has a tremendous influence.


D. In a single datacenter using Kubernetes

  1. The repo for using Consul on Kubernetes is at

    https://github.com/hashicorp/consul-k8s

  2. Get the official Helm chart:

    git clone https://github.com/hashicorp/consul-k8s/tree/main/charts/consul

    (previously https://github.com/hashicorp/consul-helm.git)

  3. Customize file values.yaml such as:

    global:
    enabled: true
    image: "consul:1.5.1"
    imagek8: "hashicorp/consul-k8s:0.8.1"
    domain: consul
    datacenter: primarydc
    server:
      enabled: true
      replicas: 3
      bootstrapExpect: 3
    

    See https://www.consul.io/docs/k8s/helm

  4. Identify the latest release for image: “consul at:

    https://github.com/hashicorp/consul/releases

    which was v1.12.0 on April 20, 2022.

  5. STAR: Identify the latest release of imagek8: “hashicorp/consul-k8s: at:

    https://github.com/hashicorp/consul-k8s/releases

    which, at time of writing, was v0.44.0 (May 17, 2022).

    This is reflected at: https://artifacthub.io/packages/helm/hashicorp/consul

    See https://www.consul.io/docs/k8s/installation/install

  6. Deploy using Helm:

    helm install consul.helm -f values.yaml
    

E. In a single 6-node datacenter (survive loss of an Availability Zone)

HA (High Availability)

In order for a datacenter to withstand the sudden loss of a server within a single Availability Center or the loss of an entire Availability Zone, setup 6 servers for best resilience plus performance under load:

Consul 6 nodes

The yellow star in the diagram above marks the LEADER Consul server. The leader is responsible for ingesting new log entries of cluster changes, writing that to durable storage, and replicating to followers.

PROTIP: Only the LEADER processes requests. FOLLOWERs do not respond to request as their job is just to receive replication data (enjoy the food and stand by like a Prince). This architecture is similar to Vault’s.

Consul Non-voting for scale

IMPORTANT: For better scalability, use Consul’s Enterprise “Autopilot” mechanism to setup “NON-VOTER” Consul server nodes to handle additional processing for higher performance under load. See https://play.instruqt.com/HashiCorp-EA/tracks/consul-autopilot

The NON-VOTER is in Zone 2 because leadership may switch to different FOLLOWER servers over time.

So keep the above in mind when using this page to describe the Small and Large server type in each cloud.

PROTIP: The recommended maximum number of Consul client nodes for a single datacenter is 5,000.

CAUTION: A Consul cluster cannot operate in a single Availability Zone.

Actually, HashiCorp’s Consul Enterprise Reference Architecture for a single cluster is 5 Consul server nodes across 3 availability zones.

Consul Ref. Arch

Within an Availability Zone, if a voting FOLLOWER becomes unavailable, a non-voting member in the same Availability Zone is promoted to a voting member:

Consul promote in AZ

Raft concensus algorithm

Consider these dynamic illustrations about how the Raft mechanism works:

  • http://thesecretlivesofdata.com/raft/ provides a visualization
  • https://raft.github.io/

To ensure data consistency among nodes across Availability Zones, the Raft consensus algorithm (a simpler implementation of Paxos) maintains consistent state storage for updating catalog, session, prepared query, ACL, and KV state.

Each transaction is considered “comitted” when more than half the followers register it.

If the LEADER server fails, an election is automatically held among a quorum (adequate number of) FOLLOWERs to elect a new LEADER from among candidates.

TUTORIAL:

Serf LAN & WAN Gossip

  • https://learn.hashicorp.com/tutorials/consul/federation-gossip-wan
  • https://www.consul.io/docs/intro/vs/serf

To ensure that data is distributed with reliable communication not assumed, Consul uses the Gossip protocol powered by the multi-platform Serf library open-sourced by HashiCorp at https://github.com/hashicorp/serf (writte in Golang). The Gossip protocol is also used by the Apache Serf library, which is a modified version of the SWIM (Scalable Weakly-consistent Infection-style Process Group Membership) protocol.

Serf provides for:

  • Events broadcasting to perform cross-datacenter requests based on Membership information

  • Failure detection to gracefully handle loss of connectivity

No Vault - Hard Way

If Vault is not used, do it the hard way:

  1. Generate Gossip encryption key (a 32-byte AES GCM symmetric key that’s base64-encoded).

  2. Arrange for regular key rotation (using the Keyring built in Consul)

  3. Install encryption key on each agent.

  4. Review Gossip Telemetry output.

NOTE: To manage membership and broadcast messages to the cluster,

Refer to the Serf documentation


F. For HA on multiple datacenters federated over WAN

REMEMBER: Like Vault, Consul Datacenter federation is not a solution for data replication. There is no built-in replication between datacenters. consul-replicate is what replicates KV between datacenters.

The Enterprise edition of Consul enables communication across datacenters using Federate Multiple Datacenters coordinated using WAN Gossip protocol.

Consul Federation

  • https://learn.hashicorp.com/tutorials/consul/federation-gossip-wan?in=consul/networking

Setup Network Areas

Create compatible areas in each datacenter:

  1. Define DATACENTER IDs

    DATACENTER1_ID="dc1"
    DATACENTER2_ID="dc2"
    
  2. Repeat for each DATACENTER ID value:

    consul operator area create \
    -peer-datacenter="${DATACENTER1_ID}"
    
    consul operator area create \
    -peer-datacenter="${DATACENTER2_ID}"
    
  3. Run for the first datacenter with its DATACENTER_IP value:

    consul operator area join \
    -peer-datacenter="${DATACENTER1_ID}"  "${DATACENTER_IP}"
    

    This establishes the handshake.

    consul-replicate

  4. To perform cross-data-center Consul K/V replication, install a specific tag of the consul-replicate daemon to run continuosly:

    https://github.com/hashicorp/consul-replicate/tags

    The daemon consul-replicate integrates with Consul to manage application configuration from a central data center, with low-latency asynchronous replication to other data centers, thus avoiding the need for smart clients that would need to write to all data centers and queue writes to handle network failures.

    QUESTION: No changes since 2017, so doesn’t work with TLS1.3, arm64, new Docker versions. Developer Seth Vargo is now at Google.

    https://learn.hashicorp.com/tutorials/consul/federation-gossip-wan?in=consul/networking

    Replicate ACL entries

    Cache ACLs for them to “ride out partitions”.

  5. Configure primary datacenter servers and clients

    {
      "datacenter": "dc1"
      "primary_datacenter": "dc1"
      "acl": {
     "enabled": true,
     "default_policy": "deny",
     "enable_token_persistence": true 
      }
    }
    
  6. Create ACL policy

    acl = "write"
    operator = "write"
    service_prefix "" {
      policy = "read"
      intentions = "read"
    }
    

    REMEMBER: Intentions follow a top-down ruleset using Allow or Deny intentions. More specific rules are evaluated first.

  7. Create ACL replication token

    create acl token create \
      -description "ACL replication token" \
      -policy-name acl-replication  
    

    Sample response:

    AccessorID:
    SecretID:
    Description:
    Local: false
    Create Time:
    Policies:
    
  8. Configure secondary datacenter agents (servers and clients):

    {
      "datacenter": "dc2"
      "primary_datacenter": "dc1"
      "acl": {
     "enabled": true,
     "default_policy": "deny",
     "enable_token_persistence": true,
     "enable_token_replication": true 
      }
    }
    
  9. Apply replication token to servers in secondary datacenter:


Enterprise configuration

From v1.10.0 on, a full license file must be defined in the server config file before installation:

log_level     = "INFO"
server         = true
ui             = true
datacenter     = "us-east-1"
license_path   = "/opt/consul/consul.hclic"
client_addr    = "0.0.0.0"
bind_addr      = "10.1.4.11"
advertise_addr = "10.1.4.11"
advertise_addr_wan = "10.1.4.11"
   

Within CLI:

license_path = "/etc/consul.d/consul.hclic"
   
advertise_addr

are reacheable outside the datacenter.

Agent configurations have a different IP address and these settings to auto-join based on cloud (AWS) tags:

data_dir  = "/opt/consul/data"
bootstrap_expect = 5
retry_join       = ["provider=aws region=us-east-1 tag_key=consul tag_value=true"]
retry_join_wan   = ["10.1.2.3","10.1.2.4"]
connect = {
   enabled = true
}
performance = {
   raft_multiplier = 1
}
   

license_path - PROTIP: some use “.txt” or “.hcl” instead of “.hclic” to avoid the need to change text editor preferences based on file extension.

retry_join specifies the cloud provider and other metadata for auto-discovery by other Consul agents.

retry_join_wan specifies the IP address of each datacenter ingress.

WAN encryption has its own encryption key.

connect refers to Consul Connect (disabled by default for security).

raft_multiplier = 1 overrides for high-performance production usage the default value 5 for dev usage. This setting multiplies the time between failed leader detection and new leader election. Higher numbers extends the time (slower) to reduce leadership churn and associated unavailability.

TLS configuration

Consul has root and intermediate CA capability built-in to create certificates.

Vault can also be used.

A CA is named “server.datacenter.domain”.

  1. Generate TLS .pem files.

  2. Add “verify_” TLS encryption settings to the Consul Agent config file:

    ...
    verify_incoming = true
    verify_outgoing = true
    verify_server_hostname = true
     
    ca_file = "consul-agent-ca.pem"
    cert_file = "dc1-server-consul-0.pem"
    key_file = "dc1-server-consul-0-key.pem"
    encrypt = "xxxxxxxx"
    

Enterprise Autopilot CLI Commands

For write redundancy through automatic replication across several zones, add a tag “az” for “availability zone” to invoke the Enterprise feature “Consul Autopilot”:

autopilot = {
  redundancy_zone_tag = "az"
  min_quorum          = 5
   }
node_meta = {
   az = "Zone1"
}
   

The Enterprise Autopilot feature performs automatic, operator-friendly management of Consul servers, including cleanup of dead servers, monitoring the state of the Raft cluster, automated upgrades, and stable server introduction.

Autopilot enables Enterprise Redundancy Zones to improve resiliency and scaling of a Consul cluster. It can add “non-voting” servers which will be promoted to voting status in case of voting server failure. Unless during failure, Redundant zones do not participate in quorum, including leader election.

  1. To get Autopilot configuration settings:

    consul operator autopilot get-config

    Sample response:

    CleanupDeadServers = true
    LastContactThreshold = 200ms
    MaxTrailingLogs = 250
    MinQuorum = 0
    ServerStabilizationTime = 10s
    RedundancyZoneTag = ""
    DisableUpgradeMigration = false
    UpgradeVersionTag = ""
    

    Alternately, make an API call for JSON response:

    curl http://127.0.0.1:8500/v1/operator/autopilot/configuration
    {
      "CleanupDeadServers": true,
      "LastContactThreshold": "200ms",
      "MaxTrailingLogs": 250,
      "MinQuorum": 0,
      "ServerStabilizationTime": "10s",
      "RedundancyZoneTag": "",
      "DisableUpgradeMigration": false,
      "UpgradeVersionTag": "",
      "CreateIndex": 5,
      "ModifyIndex": 5
    }
    
  2. Start a Consul server
  3. See which Consul servers joined:

    consul operator raft list-peers
    Node             ID                                    Address            State     Voter  RaftProtocol
    consul-server-1  12345678-1234-abcd-5678-1234567890ab  10.132.1.194:8300  leader    true   3
    

    After a quorum of servers is started (third new server), autopilot detects an equal number of old nodes vs. new nodes and promotes new servers as voters. This triggers a new leader election, and demotes the old nodes as non-voting members.


Mesh Gateway

When performing cross-cloud service communication:

multi-cluster-comm

services avoid exposing themselves on public networks by using Mesh Gateways (built upon Envoy) which sit on the public internet to accept L4 traffice with mTLS. Mess Gateways perform NAT (Network Address Translation) to route traffic to endpoints in the private network.

Consul provides an easy SPOC (Single Point of Contact) to specify rules for communication instead of requesting Neworking to manually configure a rule in the firewall.

  1. Generate GATEWAY_TOKEN value

  2. Start the Mesh Gateway:

    consul connect envoy \
    -gateway mesh
    -register \
    -service "mesh-gateway" \
    -address "${MESH_PRIVATE_ADDRESS}" \
    -wan-address "${MESH_WAN_ADDRESS}" \
    -admin-bind 127.0.0.1:0 \
    -token="${GATEWAY_TOKEN}"
    
  3. Configure one Consul client with access to each datacenter WAN link:

    • Envoy

    • Enable gRPC

    Telemetry and capacity tests

    Adequate reserve capacity for each component are necessary to absorb sudden increases in activity.

    Alerts are necessary to request manual or automated intervention.

    Those alerts are based on metrics for each component described at https://www.consul.io/docs/agent/telemetry

    Artificial loads need to be applied to ensure that alerts and interventions will actually occur when appropriate. Load testing exposes the correlation of metric values at various levels of load. All this is part of a robust Chaos Engineering needed for pre-production.

At scale, customers need to optimize for stability at the Gossip layer.*

Manage from another Terminal

  1. At the Terminal within a Consul agent instance,
    create another Terminal shell instance to interact with the Consul agent running

    consul members

    A sample successful response:

    Node         Address         Status  Type    Build  Protocol  DC   Partition Segment
    Judiths-MBP  127.0.0.1:8301  alive   server  1.12.0  2         dc1  default <all>
    

    PROTIP: The above command is only needed once to join a cluster. After that, agents Gossip with each other to propagate membership information with each other.

    This error response reflects that CLI commands are a wrapper for API calls:

    Error retrieving members: Get "http://127.0.0.1:8500/v1/agent/members?segment=_all": dial tcp 127.0.0.1:8500: connect: connection refused
    

    BTW, to join a WAN, it’s

    consul members -wan
  2. For more detail about Tags:

    consul members -detailed

    Sample response:

    Node                  Address         Status  Tags
    wilsonmar-N2NYQJN46F  127.0.0.1:8301  alive   acls=0,ap=default,build=1.12.0:09a8cdb4,dc=dc1,ft_fs=1,ft_si=1,id=40fee474-cf41-1063-2790-c8ff2b14d4af,port=8300,raft_vsn=3,role=consul,segment=<all>,vsn=2,vsn_max=3,vsn_min=2,wan_join_port=8302
    

    Rejoin existing server

    If a Consul server fails in a multi-server cluster, bring the server back online using the same IP address.

    consul agent -bootstrap-expect=3 \
    -bind=192.172.2.4 -auto-rejoin=192.172.2.3
    

Consul Tutorials from HashiCorp

https://learn.hashicorp.com/consul

https://cloud.hashicorp.com/docs/consul/specifications

Leader/Follower (instead of Master/Slave)

https://learn.hashicorp.com/tutorials/cloud/get-started-consul?in=consul/cloud-get-started


G. Integrations to legacy VMs, mainframes, etc.

  • https://medium.com/hashicorp-engineering/supercomputing-with-hashicorp-5c827dcb2db8

Use this to learn about configuring for integrating HashiCorp Consul to work across the entire Enteprise landscape of technologies (another major differentiator of HashiCorp Consul).

Multi-platform (VMWare, mainframe)

VIDEO: Many enterprises also have legacy applications running VMware or still in a mainframe.

That’s where HashiCorp Consul comes in, with multi-platform/cloud

Consult Multi-cloud Envoy

VIDEO: Kubernetes was designed with features to address each, but Consul sychronizes across everal Kubernetes instances – in different clouds – and also sychronizes with Serverless, Cloud Foundry, OpenShift, legacy VMs, even mainframes.

Consul multi-platform

Consul provides better security along with less toil (productivity) for both Kubernetes and legacy platforms, across several clouds.

That’s full enterprise capabilities.

“Multi-platform and multi-cloud choose you, due to corporate mergers and acquisitions and capacity limits in some cloud regions”

You can see how Consul behaves on Power 9 (PPC) and IBM Z (S390x) “mainframe supercomputers” without the expense, emulate them with Hercules or QEMU on pure X86_64 Windows PC, Xeon Linux workstation and KVM but it can also be done on a Mac. Power9, ended up being much simpler than S390.

Using Vagrant

  1. VIDEO: Based on a Kubernetes 5-node cluster created using this Helm chart:

  2. Install Vagrant and download the Vagrantfile

    brew install vagrant  # Vagrant 2.2.19
    curl -O https://github.com/hashicorp/consul/blog/master/demo/vagrant-cluster/Vagrantfile
    

    CAUTION: As of this writing, Vagrant does not work on Apple M (ARM) chipset on new macOS laptops.

    vagrant up

    SSH into each server: vagrant ssh n1

    helm install ./consul-helm -f ./consul-helm/demo.values.yaml --name consul
    1. Install Consul binary
    2. Add Consul Connect to a Kube app
    3. Integrate legacy apps with Kubernetes

Kubernetes runs a sample “emojify” app which runs an NGNX website calling the “facebox” service API running a machine-learning model to add emoji images on the faces people in input photos (from Honeycomb.io)

Consul Emojify demo

“502 Bad Gateway” appears during deployment.

Connect to a Payment service outside Kubernetes.


Customize HTTP Response Headers

  1. Ask whether you app should have additional security headers such as X-XSS-Protection for API responses.

Collaborations

Ambassador’s Edge Stack (AES) for service discovery.

Competitors

See https://www.consul.io/docs/intro/vs

“[23:07] “Consul Connect is probably the most mature simply because of Consul. Consul is a decade of polished technology, battle-tested in each production environment. It’s a safe choice in terms of stability and features.” – The Best Service Mesh: Linkerd vs Kuma vs Istio vs Consul Connect comparison + Cilium and OSM on top

Service Discovery: Hystrix, Apache, Eureka, SkyDNS

CASE STUDY: Self-Service Service Mesh With HCP Consul Tide abandoned its adoption of AWS AppMesh in favor of HashiCorp Consul, making the transition in only 6 weeks with no downtime and no big-bang migration.

Istio

GitLab

https://konghq.com/kong-mesh

Cisco

H3C

ManageEngine OpManager

Extreme Networks, Inc

Arista Networks

Big Cloud Fabric

Equinix Performance Hub

HPE Synergy

NSX for Horizon

OpenManage Network Manager

CenturyLink

Huawei Cloud Fabric

Aricent

Cloudscaling

Cumulus

HostDime

ArgoCD

Compare against these Reference architecture diagram:

References

https://www.hashicorp.com/blog/consul-1-12-hardens-security-on-kubernetes-with-vault?

https://www.pagerduty.com/docs/guides/consul-integration-guide

Simplifying Infrastructure and Network Automation with HashiCorp (Consul and Nomad) and Traefik

VIDEO: “Community Office Hours: HashiCorp Consul on AWS ECS” by Rosemary Wong and Luke Kysow

VIDEO: “Service Mesh and Your Legacy Apps: Connecting to Kubernetes with Consul” by Marc LeBlanc (with Arctiq)

“A Practical Guide to HashiCorp Consul — Part 1 “ by Velotio Technologies

https://thenewstack.io/3-consul-service-mesh-myths-busted/

https://www.youtube.com/watch?v=UHwoEGSfDlc&list=PL81sUbsFNc5ZgO3FpSLKNRIIvCBvqm-JA&index=33 The Meshery Adapter for HashiCorp Consul

https://webinars.devops.com/getting-hashicorp-terraform-into-production (on Azure) by Mike Tharpe with TechStrong

https://github.com/alvin-huang/consul-kv-github-action GitHub Action to pull a value from Consul KV

https://www.hashicorp.com/resources/unboxing-service-mesh-interface-smi-spec-consul-kubernetes

BOOK: “HashiCorp Infrastructure Automation Certification Guide”by Ravi Mishra

Packt BOOK: “Full Stack Development with JHipster - Second Edition” has a section on management of a full-featured sample Java Spring app using Consul instead of the default Eureka (JHipster Registry) which only supports Spring Boot. The author says The main advantages of using Consul are:

  • It has a lower memory footprint.
  • It can be used with services that are written in any programming language.
  • It focuses on consistency rather than availability.

“Consul also provides service discovery, failure detection, multi-datacenter configuration, and key-value storage.”


HashiCorp Corporate Social

Twitter: @hashicorp

Ambassadors (first announced March, 2020)

LinkedIn: https://www.linkedin.com/company/hashicorp

Facebook: https://www.facebook.com/HashiCorp

END