How to get proactive health metrics for managing cloud-native applications quicker, with less toil
Overview
This is a hands-on tour of how cloud-native applications getting more proactive health metrics, faster, with less work. The contribution of this write-up is a logically presented and deep yet succinct tutorial that incorporates many of the videos and documentation about the subject. Commentary along the way include “NOTE” and “PROTIP” flags to hard-won advice available no where else, without the marketing generalities.
NOTE: Content here are my personal opinions, and not intended to represent any employer (past or present). “PROTIP:” here highlight information I haven’t seen elsewhere on the internet because it is hard-won, little-know but significant facts based on my personal research and experience.
The objective here is to enable you to says “hell yeah” in job interviews:
- Possesses expertise designing, analyzing, and troubleshooting large-scale distributed systems.
- Takes a system problem-solving approach
Lightstep’s observability platform provides intuitive GUI dashboards that distill actionable insight about apps instrumented by logs, errors, exceptions, metrics, and traces.
Lightstep’s Change Intelligence answers the question “What caused that change in measured values?”.
Lightstep provides a “back end” service for the OpenTelemetry framework (contracted to “otel”) which provides a vendor-agnostic (open source) way to collect logs, errors, exceptions, metrics, and traces about individual API requests and other events. The two components of Otel is an API library for various languages
Fun fact: the word telemetry comes from the Greek words tele, meaning “remote” and metron for measure.[4]
Otel also offers a Collector agent which runs on the operating system to collect system metrics (such as memory, CPU, and attached storage consumption). The Collector also sends metrics to a metrics back end.
consistent (and easier) way for developers and SREs to collect
Back-ends (Lightstep’s competitors) include Splunk, Prometheus, Dynatrace, New Relic, Datadog, Stackdriver.
BTW Lightstep is built by the team that launched observability at Google (Dapper).
Competitors
Many vendors say their tool “helps you make data-driven decisions and reduce the time to investigate issues so you can free up resources for more important activities”.
Because Lightstep can both reduce the amount of monitoring data yet yield higher-quality signals, because tracing propages context (and logging does not), rather than random sampling. it can be argued that Splunk and others can partner with Lightstep and OpenTelementry.
More logical sampling can mean fuller granularity for selected items.
vs Dynatrace
AppDynamics
New Relic
IBM Instana
Honeycomb.io
Observability Platform?
“Observability (o11y) is the ability to ask questions of your system and learn from it. Instrumentation is the tooling that provides your telemetry. Metrics, logs, and traces are the components of telemetry.” - Keynote: (Open)Telemetry Makes Observability Simple - Sarah Novotny & Liz Fong-Jones CNCF [Cloud Native Computing Foundation]
OpenTelemetry “makes robust, portable telemetry a built-in feature of cloud-native software”.
VIDEO: Observability vs. APM vs. Monitoring
Who cares? (Customer stories)
OpenTelemetry is the second-most active project in CNCF (after Kubernetes).
Among OpenTelemetry’s 245+ active contributors from 45+ companies are:
- Spotify
- Shopify
- Postmates
- Mailchimp
- Zillow
- Github - https://github.blog/2021-05-26-why-and-how-github-is-adopting-opentelemetry/ May 26, 2021
-
Twilio
- CNCF Fluentbit
- Kafka adopting Open Telemetry
Social
Hands-on intro
-
Look at Lightstep’s home on GitHub:
https://github.com/lightstep
-
These are Lightstep packages for configuring OpenTelemetry:
- https://github.com/lightstep/otel-launcher-java
- https://github.com/lightstep/otel-launcher-python
- https://github.com/lightstep/otel-launcher-go (VIDEO)
- https://github.com/lightstep/otel-launcher-node (JavaScript)
- Ruby?
- Erlang?
NOTE: Each platform has a different Default format to output logs
“several pieces of GitHub’s infrastructure use different statsd dialects, which means we have to special-case our telemetry code in different places – a non-trivial amount of work!”
semantic convention for creating a metric for HTTP latancy or HTTP QPS.
The SDK contains … sampling, creating a trace, or creating a metric …
Java
-
Pick the Java language launcher:
- https://github.com/lightstep/otel-launcher-java
CI/CD and API secret handling
- https://github.com/lightstep/otel-launcher-java
-
Notice the .circleci file. You may prefer using other CI/CD tools.
Modern CI/CD utilities come with a Secrets Manager that encrypts API keys, then decrypt them automatically during runs.
-
Sign up for the Lightstep free Community plan:
https://app.lightstep.com/signup/developer
-
Obtain a Lightstep API token from the Lightstep.com GUI
-
Paste the API key in your CI/CD secrets manager
In shell file calling, define the environment variable that provides your Lightstep API Access Token:
export LS_ACCESS_TOKEN=my-access-token-etc
See https://www.freecodecamp.org/news/how-to-securely-store-api-keys-4ff3ea19ebda/
NOTE: collect -> Monitor (dashboards) -> analyze
-
Notice the .circleci uses the Java 8 jdk within:
https://github.com/lightstep/otel-launcher-java/blob/main/.circleci/config.yml
-
Notice use of GitHub’s CodeQL for
https://github.com/lightstep/otel-launcher-java/blob/main/.github/workflows/codeql-analysis.yml
-
Download the latest version of lightstep-opentelemetry-javaagent.jar:
NOTE: The Lightstep OpenTelemetry Agent is a configuration layer over OpenTelemetry Instrumentation Agent.
-
VIDEO: OpenTelemetry Deep Dive: Java referencing code at https://github.com/tedsuo/otel-java-basics from https://github.com/carlosalberto/otel-java-basics.
[31:45] In client/App.Java, the sample client makes 5 calls to the “hello world” server app.
[32:19] In server/App.java, the “OkHttpClient()” is instrumented automatically.
The agent automatically propagates context.
Collector on servers
The Lightstep Collector runs on the operating system to collect system metrics (such as memory, CPU, and attached storage consumption). It is also a type of proxy which sends metrics to a metrics back end.
The OpenTelemetry Collector receives data in several formats created in the industry:
- Yeager protocol
- Zipkin protocol
- Census protocol
- Open Telemetry protocol
Integrations -> Exporters
Processing (Ingesting)
filer -> modify -> batch -> other
OpenTelemetry
OTel Java Launcher: https://github.com/lightstep/otel-launcher-java/ Quickstart Guide: https://opentelemetry.lightstep.com/java OpenTelemetry Java: https://github.com/open-telemetry/opentelemetry-java
-
Build:
make
-
Run the server and wait for it to be ready:
make run-server
-
Run the client to perform a few requests:
make run-client
OLTP (OpenTelemetry Protocol)
Collector enables both Push and pull from the back-end.
OpenTelemetry.io OpenTelemetry is a standardization effort to provide an open source framework that provides a single set of APIs, libraries, and instrumentation resources to capture distributed traces and metrics from applications.
https://github.com/open-telemetry/opentelemetry-specification
Open Telemetry is the result of a merger of Open Tracing with Open Census.
VIDEO: Keynote: History of OpenTelemetry Priyanka Sharma GM of CNCF introduces Ted and Morgan McLean to discuss the history and story behind OpenTelemetry, and what it means for the future of the project. [4:18] “A language to describe distributed systems”
API like Log4j
https://www.w3.org/blog/2019/12/trace-context-enters-proposed-recommendation/ HTTP headers from B3 to W3C trace-context for Context Propagation https://www.w3.org/TR/trace-context/
https://pkg.go.dev/go.opencensus.io/plugin/ochttp/propagation/b3
- How to get instrumented with OpenTelemetry in under 10 minutes in Java
- Common distributed tracing use cases and why they are the foundation for observability
- Best practices and common pitfalls for distributed tracing
- How to prepare to roll out OpenTelemetry across your organization
https://www.youtube.com/watch?v=_OXYCzwFd1Y Modern Observability with OpenTelemetry</a>
from Google Dapper and its system of exemplars.
https://docs.lightstep.com/docs/how-lightstep-works
In distributed tracing, a trace is a view into a request as it moves through a distributed system. Trace ID associated with each operation within a service. and a Span ID (operational ID) for each transaction which spans several operations. A span is a named, timed operation that represents a piece of the workflow.
Multiple spans represent different parts of the workflow and are pieced together to create a trace.
context object
Serializing context objects is called “injection” of HTTP headers.
De-serializing context objects is called “propagation” downstream.
Within the context header, a “traceparent” has a trace-id and span-id with a sampling flag. A tracestate has internal details.
Additionally, a “project ID” can be added to the context “Baggage” which contains arbitrary key-value pairs. for A/B testing, etc.
with a label (attribute) for each dimension
Previously called “OpenTracing” created in 2016.
not sample to identify outliers
or tracing (stucts) follows service to service context propagates state. joins
OpenTelemetry: OpenTelemetry is the unified initiative that takes the best of both OpenTracing and OpenCensus forward.
OpenTracing: <a target="_blank" href="https://twitter.com/opentracing">@OpenTracing</a>) Lightstep tracers work with the OpenTracing API to create and send span data to the Lightstep web application. See the OpenTracing Registry for details on out-of-the-box instrumentation for common packages and frameworks.
OpenCensus: Lightstep supports ingesting trace data from OpenCensus-instrumented applications via exporters.
Jaeger Agent:(<a target="_blank" href="https://twitter.com/JaegerTracing">@JaegerTracing</a>) Lightstep can ingest data directly from a Jaeger Agent.
Zipkin: Lightstep can ingest data directly from Zipkin.
Chisel: A tooling library that comes with Lightstep and OpenTracing built in, that works with Pedestal (a popular Clojure libraries for building APIs).
OpenTracing instrumentation
-
https://github.com/opentracing-contrib/java-jdbc OpenTracing Instrumentation for JDBC
-
https://github.com/opentracing-contrib/java-spring-web OpenTracing Java Spring Web instrumentation
-
https://github.com/opentracing-contrib/java-kafka-client OpenTracing Instrumentation for Apache Kafka Client
-
https://github.com/opentracing-contrib/csharp-netcore OpenTracing instrumentation for .NET Core & .NET 5 apps
-
https://github.com/uber-common/opentracing-python-instrumentation A collection of Python instrumentation tools for the OpenTracing API
-
https://github.com/opentracing-contrib/python-sqlalchemy https://github.com/carlosalberto/python-sqlalchemy OpenTracing instrumentation for SQLAlchemy
-
https://github.com/carlosalberto/python-pyramid OpenTracing instrumentation for the Pyramid framework
Additionally:
-
https://github.com/zalando/opentracing-toolbox Best-of-breed OpenTracing utilities, instrumentations and extensions
-
https://github.com/RisingStack/opentracing-auto Out of the box distributed tracing for Node.js applications with OpenTracing.
Database
References
https://www.youtube.com/watch?v=HExcLWA2b8M Live Interview with Skyscanner: Observability Best Practices & OpenTelemetry Lightstep
https://www.youtube.com/watch?v=NpE9nNmI9g4 SLA vs SLO vs SLI: All you need to know Lightstep
https://www.youtube.com/watch?v=FlghuHDlQdM Beyond Getting Started: Using OpenTelemetry to Its Full Potential - Sergey Kanzhelev (Microsoft) & Morgan McLean (Google) CNCF [Cloud Native Computing Foundation]
https://www.youtube.com/watch?v=FbHbDikEUYg Introduction to OpenTelemetry on Kubernetes by infrastructure atscale 182 views 11 months ago
https://www.youtube.com/watch?v=J0XOGlf1bwk” title=”Apr 2, 2020”> What’s Lightstep?</a>
- trace by specific customer
- add custom tags
- error analysis
https://www.youtube.com/watch?v=_OXYCzwFd1Y 11:49 Modern Observability with OpenTelemetry</a>
“We’ve seen people reduce logging 95% by adopting Tracing” – Why distributed tracing will replace (most) logging for cloud-native architectures interviews Lightstep co-founder and CEO, Ben Sigelman, and OpenTelemetry co-founder, Ted Young.
Why Developers and SREs Choose Lightstep for Observability interviews users “Lightstep drove us right to where it’s a problem”
https://www.youtube.com/watch?v=GGRAvY8_7Ps” title=”Feb 4, 2021”> Announcing Change Intelligence</a> identifies “most likely causes of performance changes” based on baseline history
https://www.youtube.com/watch?v=MQ0NXN5n0Es OpenTelemetry insights: How will Traces and Metrics interact by Ted Young and Josh MacDonald at Lightstep
https://www.youtube.com/watch?v=FbHbDikEUYg Introduction to OpenTelemetry on Kubernetes infrastructure atscale 182 views 11 months ago
https://www.youtube.com/watch?v=1vMu7iskQaY Getting Started with OpenTelemetry - Ted Young, Lightstep Continuous Delivery Foundation 1.5K views 9 months ago
https://www.youtube.com/watch?v=1DxMHqYIvkQ OpenTelemetry Auto-Instrumentation Deep Dive - Carlos Alberto Cortez & Alex Boten, LightStep CNCF [Cloud Native Computing Foundation] 1.1K views 10 months ago
Webinar: How OpenTelemetry is Eating the World (from CNCF) presented by Steve Flanders (Splunk Dir. of Engineering, Otel Collector approver)
Tutorial: OpenTelemetry Java Instrumentation with Spring Boot in under 5 minutes (from Lightstep)
Introduction to Tracing : OpenTelemetry & Opentracing by That DevOps Guy 7.7K views 3 months ago
https://www.youtube.com/watch?v=TmFBDsnLbAY 45:49 Now playing Distributed Tracing with Micronaut Object Computing 1K views 2 years ago
https://www.youtube.com/watch?v=vtuffPM5zXc OpenTelemetry in practice - Ilya Kaznacheev GoLab conference 659 views 7 months ago
https://www.youtube.com/watch?v=88ZjCbT6LPc OpenTelemetry Java Auto-Instrumentation SIG 2020/03/19 OpenTelemetry 264 views 1 year ago
https://www.youtube.com/watch?v=CFLZJSwbYI0 16:21 Now playing Spring Tips: Zipkin and Distributed Tracing SpringDeveloper 24K views 4 years ago
https://www.youtube.com/watch?v=RvCcWltMY7U Spring Boot OpenTracing instrumentation, using Jaeger and Zipkin Pavol Loffay 21K views 4 years ago
https://www.youtube.com/watch?v=mNMw148wpZ4 MicroServices | Distributed Logging & Tracing Byte Programming 7.8K views 1 year ago
What Is OpenTelemetry? New Relic 5.7K views 1 year ago
OpenTelemetry Architecture Overview by John Watson New Relic https://opensource.newrelic.com/projects/open-telemetry Tracer SDK
Microservices and Kubernetes Observability | Metrics, Logs, Tracing, Chaos Experiments by Tech Primers 8.7K views 1 year ago
[4] VIDEO: freecodecamp.org’s OpenTelemetry Course - Understand Software Performance by Code with Ania Kubów (https://github.com/kubowania who began with JavaScript games in 2019)
-
Run Zipkin from DockerHub:
docker run --rm -d -p 9411:9411 --name zipkin openzipkin/zipkin
https://www.novatec-gmbh.de/en/blog/ocelot-meets-lightstep/
More on Security
This is one of a series on Security in DevSecOps:
- Security actions for teamwork and SLSA
- Code Signing on macOS
- Git Signing
- GitHub Data Security
- Azure Security-focus Cloud Onramp
- AWS Onboarding
- AWS Security (certification exam)
- AWS IAM (Identity and Access Management)
- SIEM (Security Information and Event Management)
- Intrusion Detection Systems (Goolge/Palo Alto)
- SOC2
- FedRAMP
-
CAIQ (Consensus Assessment Initiative Questionnaire) by cloud vendors
- AKeyless cloud vault
- Hashicorp Vault
- Hashicorp Terraform
- SonarQube
- WebGoat known insecure PHP app and vulnerability scanners
- Security certifications
- Quantum Supremecy can break encryption in minutes
- Pen Testing
- Threat Modeling
- WebGoat (deliberately insecure Java app)