The sidecar proxy enables business logic to hand off to a control pane to take care of cross-cutting operational concerns
Overview
“Service mesh” architecture is about microservices applications working within a “control plane” a standard way to hand-off service-to-service access control authentication, encrypted communications, monitoring, logging, timeout handling, load balancing, health checks, and other operational cross-cutting concerns to a sidecar proxy within its pod, which works with a control plane common to all services.
NOTE: Content here are my personal opinions, and not intended to represent any employer (past or present). “PROTIP:” here highlight information I haven’t seen elsewhere on the internet because it is hard-won, little-know but significant facts based on my personal research and experience.
Decentralized microservices apps Dockerized for running in containers make all network communication through its sidecar proxy (like handing a box to UPS to deliver).
BLOG:
Each sidecar proxy can communicate with backends in different zones (generically named A, B, and C in the diagram). Policies sent to each sidecar can specify zones and a different amount of traffic be sent to each zone. Each zone can be in different clouds (thus multi-cloud).
This approach also enables security-related policies to be applied (such as limiting outflows from some countries) and detection of zonal failure and automatic rerouting around traffic anomalies (including DDoS attacks).
Each sidecar proxy and backend service report periodic state to the global load balancer (GLB) so it can make decisions that take into account latency, cost, load, current failures, etc. This enables centralized visualizations engineers use to understand and operate the entire distributed system in context.
This means app developers no longer need to bother coding for a long list of operational cross-cutting concerns:
-
collection and reporting of telemetry (health checks, logs, metrics, traces)
- TLS termination (SSH key handling)
-
Handle protocols HTTP/2, WebSocket, gRPC, Redis, as well as TCP traffic
- rate limiting (DoS mitigation)
- timeout and back-out handling when response is not received
-
Circuit breakers
- Fault injection (for chaos engineering to improve reliability)
-
Enforce policy decisions
- load balancing
- Staged rollouts with percentage-based traffic splits
Embedding the above functionality in each app program may provide the best performance and scalability, but requires polyglot coding to implement the library in many languages. It can also be cumbersome to coordinate upgrades of new versions of each library across all services.
Several sidecar programs have been created:
Logically, communication of packets/requests travel through a “Data Plane”.
Control Plane
There is also a “Control Plane” which, rather than exchanging packets/requests, traffic in policies and configuration settings to enable services such as:
- deploy control (blue/green and/or traffic shifting),
- authentication and authorization settings,
- route table specification (e.g., when service A requests /foo what happens), and
- load balancer settings (e.g., timeouts, retries, circuit breakers, etc.).
The control plane aggregates telemetry data for display on dashboards such as the hero image above.
Individual apps interact with a proxy (Kubernetes sidecar) running on each service instance. The sidecars communicate with a Control Tower. This out-of-process architecture puts hard stuff in one place and allows app developers to focus on business logic. And a separate library in each language for operational concerns is not needed.
The “Control Plane” is a traffic controller that handles tracing, monitoring, logging, alerting, A/B testing, rolling deploys, canary deploys, rate limiting, and retry / circuit-breaker activities that include creation of new instances based on application-wide policies during authentication, and authorization;
The control plane includes an application programming interface, a command‑line interface, and a graphical user interface for managing the app.
Within a Service Mesh, apps create service instances from service definitions (templates) for service instances. Thus, the term service refers to both instance definitions and the instances themselves.
Several products provide a “control plane UI” (web portal/CLI) to set global system configuration settings and policies as well as
- Dynamic service discovery
- certificate management (acts as a Certificate Authority (CA) and generates certificates to allow secure mTLS communication in the data plane).
- automatic self-healing and zone failover (to maximize uptime)
Control Plane vendors
Several control plane vendors compete on features, configurability, extensibility, and usability:
-
IstioD, backed by Google, IBM, and Lyft (which contributed its Envoy proxy that works within Kubernetes as a sidecar proxy instance)
-
open-sourced Nelson uses Envoy as its proxy and builds a robust service mesh control plane around the HashiCorp stack (i.e. Nomad, etc.).
-
Kong Mesh, the licensed side of open-sourced kuma.io donated to CNCF. It’s built on top of Envoy.
-
SmartStack creates a control plane using HAProxy or NGINX.
Cloud Foundry Spring Cloud?
Istio
https://istio.io/ aims to provide a a uniform way to secure, connect, and monitor microservices. It provides rich automatic tracing, monitoring, and logging of all services to a “service mesh” – the network of microservices.
https://istio.io/docs/reference/config/
Istio provides APIs that let it integrate into any logging platform, or telemetry or policy system.
Istio makes it easy to create a network of deployed services with load balancing, service-to-service authentication, monitoring, and more.
“Without any changes in service code” applies only if the app has not implemented its own mechanism duplicative of Istio, like retry logic (which can bring a system down without attenuation mechanisms).
gRPC
gRPC is a high-performance, open-source universal RPC framework built on top of HTTP/2 to enable streaming between client and server.
It originated as project “stubby” within Google and is now a F/OSS project with open specs.
https://grpc.io/blog/principles:
- Clients open one long-lived connection to a grpc server
- A new HTTP/2 stream for each RPC call
gRPC avoids mistakes of SOAP WSDL:
- Protobuf vs. XML
References:
- https://www.youtube.com/watch?v=RoXT_Rkg8LA by Twilio
- https://github.com/salesforce/reactive-grpc Lyft Envoy uses gRPC bridge to unlock Python gevent clients.
- https://www.youtube.com/watch?v=hNFM2pDGwKI Introduction to gRPC: A general RPC framework that puts mobile and HTTP/2 first (M.Atamel, R.Tsang)
Envoy (from Lyft)
Envoy provides robust APIs for dynamically managing its configuration.
Envoy is container-aware of Docker.
H2 on both sides, supports gRPC.
Does shadowing (fork traffic to a test cluster for live perf testing)
Envoy is written in C++11.
VIDEO: Lyft’s Envoy: From Monolith to Service Mesh Feb 14, 2017 by Matt Klein (Lyft) explains from a developer’s viewpoint why SoA and its issues.
L7 reverse proxy at edge (replacement for NGINX).
Lyft uses LightStep for tracing, WaveFront for stats (via statsd).
References:
- https://www.envoyproxy.io/
- https://lyft.github.io/envoy
- Twitter: @EnvoyProxy
NGINX
NGINX built the equivalent of Istio Envoy.
https://www.nginx.com/blog/what-is-a-service-mesh/
https://www.nginx.com/blog/introducing-the-nginx-microservices-reference-architecture/
Linkerd
Linkerd (https://linkerd.io) is a Cloud Native Foundation (CNF) incubating project that also includes graduates Kubernetes and Prometheus, plus Helm, OpenTracing, gRPC, etc..
It was built in the Rust programming language.
Linkerd provides Grafana dashboards and CLI debugging tools for Kubernetes service with no cluster-wide installation:
Its customers include Salesforce, Walmart, PayPal, Expedia, Comcast.
References:
- https://linkerd.io/2/getting-started/ for installation, etc.
Patterns
Circuit breaker
The circuit breaker pattern isolates unhealthy instances, then gradually brings them back into the healthy instance pool if warranted.
Workshop
There is a quite thorough hands-on workshop using GKE (Google Kubernetes Engine).
-
https://github.com/retroryan/istio-workshop is the original worked on by Ryan, etc. contains exercises.
-
https://github.com/srinandan/istio-workshop
In this workshop, you’ll learn how to install and configure Istio, an open source framework for connecting, securing, and managing microservices, on Google Kubernetes Engine, Google’s hosted Kubernetes product. You will also deploy an Istio-enabled multi-service application
- https://github.com/jamesward/istio-workshop from Nov 2017 is a whole workshop with code.
Install prereqs and Istio
- See my tutorial to Install Git and other utilities
- See my tutorial to Install Kubernetes (minikube)
- See my tutorial to Install Helm 3.4.2-catalina
-
See my tutorial to Bring up Terminal
-
Rather than download Istio from https://github.com/istio/istio/releases
brew install istioctl
-
Verify installed version:
istioctl version
If your pods have not already been setup:
unable to retrieve Pods: Get "https://127.0.0.1:32768/api/v1/namespaces/istio-system/pods?fieldSelector=status.phase%3DRunning&labelSelector=app%3Distiod": dial tcp 127.0.0.1:32768: connect: connection refused
-
See my tutorial to Start minikube with Docker driver
-
Start
kubectl apply -f install/kubernetes/helm/helm-service-account.yaml helm init --upgrade --service-account tiller helm install install/kubernetes/helm/istio --name istio \ --namespace istio-system \ --set gateways.istio-ingressgateway.type=NodePort \ --set gateways.istio-egressgateway.type=NodePort \ --set sidecarInjectorWebhook.enabled=false \ --set global.mtls.enabled=false \ --set tracing.enabled=true
Verifying Istio is meshing
To enable Istio by default for resources deployed into the environment, “label” the namespace to enable auto-injection into:
-
First, clean up the hostname environment because we previously disabled automatic injection of the Istio proxy for the environment where we wish to transition an application to Istio, or one where multiple application environments may exist, all of which may not use a service mesh.
kubectl delete -f hostname.yaml
-
Reset the istio relase to include the auto-injection webhook:
helm upgrade istio install/kubernetes/helm/istio \ --namespace istio-system \ --set gateways.istio-ingressgateway.type=NodePort \ --set gateways.istio-egressgateway.type=NodePort \ --set sidecarInjectorWebhook.enabled=true \ --set global.mtls.enabled=false \ --set tracing.enabled=true
-
Label the namespace with the “istio-injection” key:
kubectl label namespace default istio-injection=enabled
-
Re-install the hostname.yaml app and we should see that the sidecar is automatically injected:
kubectl apply -f hostname.yaml kubectl get pods -l app=hostname
-
Use fortio for load testing a command tool written in Golang.
./fortio load -c 3 -n 20 -qps 0 http://hostname/version
qps = Queries Per Second
-
List processing stats:
./fortio-faults
Sample Apps
Blueperf - the Public Cloud Environment Performance Analysis Application containing (Java) microservices application for fictional Airline Company “Acme Air”. By Joe McClure at IBM, using IBM Cloud Kubernetes Service (IKS).
- acmeair-mainservice-java (GUI)
- acmeair-authservice-java JWT (Set environment variable SECURE_SERVICE_CALLS = false to disable authentication.)
- acmeair-bookingservice-java handles getting, making, and cancelling flight bookings.
- acmeair-customerservice-java
- acmeair-flightservice-java queries flights and reward miles.
- acmeair-driver - a workload driver for the Acme Air Sample Application.
Isotope is a synthetic app with configurable topology.
[1]“Kubernetes: Service Mesh with Istio” released 2018 by Robert Starmer (of Kumulus) is part of the “Master Cloud-Native Infrastructure with Kubernetes” Learning Path on LinkedIn. Jaeger Operator is covered.
The course uses Istio-1.0.2 and Helm within minkube on a OSX machine.
- Adding Istio to a microservice
- Traffic routing and deployment
- Creating advanced route rules with Istio
- Modifying routes for Canary deployments
- Establishing MTLS credentials
- Connecting to non-MTLS services
- Connecting Istio to OpenTracing
- Improving microservice robustness
- Forcing aborts in specific applications - done by Istio recognizing cookie headers as triggers.
Rock Stars
Ray Tsang (@saturnism, saturnism.me), Google Cloud Platform Developer Advocate in NYC:
- Making Microservices Micro with Istio Service Mesh Nov 10, 2017 at Devoxx
Kelsey Hightower:
- Istio and Kubernetes (conversation)
References
What is a service mesh? May 27, 2018 by Defog Tech
What is a service mesh? May 27, 2018 by Defog Tech
Microservices in the Cloud with Kubernetes and Istio (Google I/O ‘18) May 9, 2018 by Sandeep Dinesh
APIs, Microservices, and the Service Mesh (Cloud Next ‘19) by Dino Chiesa
By https://www.linkedin.com/in/rickhigh/
- https://www.linkedin.com/pulse/why-you-might-need-istio-rick-hightower/ “Istio decorates a network stream as AOP decorates a method call. Istio decorates a network stream as a Servlet Filter decorates an HTTP request/response.”
- https://www.linkedin.com/pulse/istio-hard-way-rick-hightower/
- https://www.linkedin.com/pulse/istio-hard-way-round-2-rick-hightower/?published=t
- https://linkedin.com/pulse/service-mesh-compared-aop-servlet-filters-rick-hightower/
- https://www.linkedin.com/pulse/service-mesh-compared-aop-servlet-filters-rick-hightower/
Social
https://www.instagram.com/explore/tags/servicemesh/