Wilson Mar bio photo

Wilson Mar


Calendar YouTube Github


How to get proactive health metrics for managing cloud-native applications quicker, with less toil

US (English)   Norsk (Norwegian)   Español (Spanish)   Français (French)   Deutsch (German)   Italiano   Português   Estonian   اَلْعَرَبِيَّةُ (Egypt Arabic)   Napali   中文 (简体) Chinese (Simplified)   日本語 Japanese   한국어 Korean


This is a hands-on tour of how cloud-native applications getting more proactive health metrics, faster, with less work. The contribution of this write-up is a logically presented and deep yet succinct tutorial that incorporates many of the videos and documentation about the subject. Commentary along the way include “NOTE” and “PROTIP” flags to hard-won advice available no where else, without the marketing generalities.

NOTE: Content here are my personal opinions, and not intended to represent any employer (past or present). “PROTIP:” here highlight information I haven’t seen elsewhere on the internet because it is hard-won, little-know but significant facts based on my personal research and experience.

The objective here is to enable you to says “hell yeah” in job interviews:

  • Possesses expertise designing, analyzing, and troubleshooting large-scale distributed systems.
  • Takes a system problem-solving approach

Lightstep’s observability platform provides intuitive GUI dashboards that distill actionable insight about apps instrumented by logs, errors, exceptions, metrics, and traces.

Lightstep’s Change Intelligence answers the question “What caused that change in measured values?”.


Lightstep provides a “back end” service for the OpenTelemetry framework (contracted to “otel”) which provides a vendor-agnostic (open source) way to collect logs, errors, exceptions, metrics, and traces about individual API requests and other events. The two components of Otel is an API library for various languages


Fun fact: the word telemetry comes from the Greek words tele, meaning “remote” and metron for measure.[4]

Otel also offers a Collector agent which runs on the operating system to collect system metrics (such as memory, CPU, and attached storage consumption). The Collector also sends metrics to a metrics back end.

consistent (and easier) way for developers and SREs to collect

Back-ends (Lightstep’s competitors) include Splunk, Prometheus, Dynatrace, New Relic, Datadog, Stackdriver.

BTW Lightstep is built by the team that launched observability at Google (Dapper).


Many vendors say their tool “helps you make data-driven decisions and reduce the time to investigate issues so you can free up resources for more important activities”.

Because Lightstep can both reduce the amount of monitoring data yet yield higher-quality signals, because tracing propages context (and logging does not), rather than random sampling. it can be argued that Splunk and others can partner with Lightstep and OpenTelementry.

More logical sampling can mean fuller granularity for selected items.

vs Dynatrace


New Relic

IBM Instana


Observability Platform?

“Observability (o11y) is the ability to ask questions of your system and learn from it. Instrumentation is the tooling that provides your telemetry. Metrics, logs, and traces are the components of telemetry.” - Keynote: (Open)Telemetry Makes Observability Simple - Sarah Novotny & Liz Fong-Jones CNCF [Cloud Native Computing Foundation]

OpenTelemetry “makes robust, portable telemetry a built-in feature of cloud-native software”.

VIDEO: Observability vs. APM vs. Monitoring

Who cares? (Customer stories)

OpenTelemetry is the second-most active project in CNCF (after Kubernetes).

Among OpenTelemetry’s 245+ active contributors from 45+ companies are:

  • Spotify
  • Shopify
  • Postmates
  • Mailchimp
  • Zillow
  • Github - https://github.blog/2021-05-26-why-and-how-github-is-adopting-opentelemetry/ May 26, 2021
  • Twilio

  • CNCF Fluentbit
  • Kafka adopting Open Telemetry




Lightstep’s YouTube channel

Lightstep on LinkedIn

Hands-on intro

  1. Look at Lightstep’s home on GitHub:


  2. These are Lightstep packages for configuring OpenTelemetry:

    • https://github.com/lightstep/otel-launcher-java
    • https://github.com/lightstep/otel-launcher-python
    • https://github.com/lightstep/otel-launcher-go (VIDEO)
    • https://github.com/lightstep/otel-launcher-node (JavaScript)
    • Ruby?
    • Erlang?

    NOTE: Each platform has a different Default format to output logs

    “several pieces of GitHub’s infrastructure use different statsd dialects, which means we have to special-case our telemetry code in different places – a non-trivial amount of work!”

    semantic convention for creating a metric for HTTP latancy or HTTP QPS.

    The SDK contains … sampling, creating a trace, or creating a metric …


  3. Pick the Java language launcher:

    • https://github.com/lightstep/otel-launcher-java

    CI/CD and API secret handling

  4. Notice the .circleci file. You may prefer using other CI/CD tools.

    Modern CI/CD utilities come with a Secrets Manager that encrypts API keys, then decrypt them automatically during runs.

  5. Sign up for the Lightstep free Community plan:


  6. Obtain a Lightstep API token from the Lightstep.com GUI

  7. Paste the API key in your CI/CD secrets manager

    In shell file calling, define the environment variable that provides your Lightstep API Access Token:

    export LS_ACCESS_TOKEN=my-access-token-etc

    See https://www.freecodecamp.org/news/how-to-securely-store-api-keys-4ff3ea19ebda/

    NOTE: collect -> Monitor (dashboards) -> analyze

  8. Notice the .circleci uses the Java 8 jdk within:


  9. Notice use of GitHub’s CodeQL for


  10. Download the latest version of lightstep-opentelemetry-javaagent.jar:

    NOTE: The Lightstep OpenTelemetry Agent is a configuration layer over OpenTelemetry Instrumentation Agent.

  11. VIDEO: OpenTelemetry Deep Dive: Java referencing code at https://github.com/tedsuo/otel-java-basics from https://github.com/carlosalberto/otel-java-basics.

    [31:45] In client/App.Java, the sample client makes 5 calls to the “hello world” server app.

    [32:19] In server/App.java, the “OkHttpClient()” is instrumented automatically.

    The agent automatically propagates context.

    Collector on servers


    The Lightstep Collector runs on the operating system to collect system metrics (such as memory, CPU, and attached storage consumption). It is also a type of proxy which sends metrics to a metrics back end.

    The OpenTelemetry Collector receives data in several formats created in the industry:

    • Yeager protocol
    • Zipkin protocol
    • Census protocol
    • Open Telemetry protocol

    Integrations -> Exporters

    Processing (Ingesting)

    filer -> modify -> batch -> other


OTel Java Launcher: https://github.com/lightstep/otel-launcher-java/ Quickstart Guide: https://opentelemetry.lightstep.com/java OpenTelemetry Java: https://github.com/open-telemetry/opentelemetry-java

  1. Build:


  2. Run the server and wait for it to be ready:

    make run-server

  3. Run the client to perform a few requests:

    make run-client

OLTP (OpenTelemetry Protocol)

Collector enables both Push and pull from the back-end.

OpenTelemetry.io OpenTelemetry is a standardization effort to provide an open source framework that provides a single set of APIs, libraries, and instrumentation resources to capture distributed traces and metrics from applications.


Open Telemetry is the result of a merger of Open Tracing with Open Census. otel-consolidation-791x193

VIDEO: Keynote: History of OpenTelemetry Priyanka Sharma GM of CNCF introduces Ted and Morgan McLean to discuss the history and story behind OpenTelemetry, and what it means for the future of the project. [4:18] “A language to describe distributed systems”

API like Log4j

https://www.w3.org/blog/2019/12/trace-context-enters-proposed-recommendation/ HTTP headers from B3 to W3C trace-context for Context Propagation https://www.w3.org/TR/trace-context/


  • How to get instrumented with OpenTelemetry in under 10 minutes in Java
  • Common distributed tracing use cases and why they are the foundation for observability
  • Best practices and common pitfalls for distributed tracing
  • How to prepare to roll out OpenTelemetry across your organization

https://www.youtube.com/watch?v=_OXYCzwFd1Y Modern Observability with OpenTelemetry</a>

from Google Dapper and its system of exemplars.


In distributed tracing, a trace is a view into a request as it moves through a distributed system. Trace ID associated with each operation within a service. and a Span ID (operational ID) for each transaction which spans several operations. A span is a named, timed operation that represents a piece of the workflow.

Multiple spans represent different parts of the workflow and are pieced together to create a trace.

context object

Serializing context objects is called “injection” of HTTP headers.

De-serializing context objects is called “propagation” downstream.

Within the context header, a “traceparent” has a trace-id and span-id with a sampling flag. A tracestate has internal details.

Additionally, a “project ID” can be added to the context “Baggage” which contains arbitrary key-value pairs. for A/B testing, etc.

with a label (attribute) for each dimension

Previously called “OpenTracing” created in 2016.

not sample to identify outliers

or tracing (stucts) follows service to service context propagates state. joins

OpenTelemetry: OpenTelemetry is the unified initiative that takes the best of both OpenTracing and OpenCensus forward.

OpenTracing: <a target="_blank" href="https://twitter.com/opentracing">@OpenTracing</a>) Lightstep tracers work with the OpenTracing API to create and send span data to the Lightstep web application. See the OpenTracing Registry for details on out-of-the-box instrumentation for common packages and frameworks.

OpenCensus: Lightstep supports ingesting trace data from OpenCensus-instrumented applications via exporters.

Jaeger Agent:(<a target="_blank" href="https://twitter.com/JaegerTracing">@JaegerTracing</a>)  Lightstep can ingest data directly from a Jaeger Agent. 

Zipkin: Lightstep can ingest data directly from Zipkin.

Chisel: A tooling library that comes with Lightstep and OpenTracing built in, that works with Pedestal (a popular Clojure libraries for building APIs).

OpenTracing instrumentation

  • https://github.com/opentracing-contrib/java-jdbc OpenTracing Instrumentation for JDBC

  • https://github.com/opentracing-contrib/java-spring-web OpenTracing Java Spring Web instrumentation

  • https://github.com/opentracing-contrib/java-kafka-client OpenTracing Instrumentation for Apache Kafka Client

  • https://github.com/opentracing-contrib/csharp-netcore OpenTracing instrumentation for .NET Core & .NET 5 apps

  • https://github.com/uber-common/opentracing-python-instrumentation A collection of Python instrumentation tools for the OpenTracing API

  • https://github.com/opentracing-contrib/python-sqlalchemy https://github.com/carlosalberto/python-sqlalchemy OpenTracing instrumentation for SQLAlchemy

  • https://github.com/carlosalberto/python-pyramid OpenTracing instrumentation for the Pyramid framework


  • https://github.com/zalando/opentracing-toolbox Best-of-breed OpenTracing utilities, instrumentations and extensions

  • https://github.com/RisingStack/opentracing-auto Out of the box distributed tracing for Node.js applications with OpenTracing.




https://www.youtube.com/watch?v=HExcLWA2b8M Live Interview with Skyscanner: Observability Best Practices & OpenTelemetry Lightstep

https://www.youtube.com/watch?v=NpE9nNmI9g4 SLA vs SLO vs SLI: All you need to know Lightstep

https://www.youtube.com/watch?v=FlghuHDlQdM Beyond Getting Started: Using OpenTelemetry to Its Full Potential - Sergey Kanzhelev (Microsoft) & Morgan McLean (Google) CNCF [Cloud Native Computing Foundation]

https://www.youtube.com/watch?v=FbHbDikEUYg Introduction to OpenTelemetry on Kubernetes by infrastructure atscale 182 views 11 months ago

https://www.youtube.com/watch?v=J0XOGlf1bwk” title=”Apr 2, 2020”> What’s Lightstep?</a>

  • trace by specific customer
  • add custom tags
  • error analysis

https://www.youtube.com/watch?v=_OXYCzwFd1Y 11:49 Modern Observability with OpenTelemetry</a>

“We’ve seen people reduce logging 95% by adopting Tracing” – Why distributed tracing will replace (most) logging for cloud-native architectures interviews Lightstep co-founder and CEO, Ben Sigelman, and OpenTelemetry co-founder, Ted Young.

Why Developers and SREs Choose Lightstep for Observability interviews users “Lightstep drove us right to where it’s a problem”

https://www.youtube.com/watch?v=GGRAvY8_7Ps” title=”Feb 4, 2021”> Announcing Change Intelligence</a> identifies “most likely causes of performance changes” based on baseline history

https://www.youtube.com/watch?v=MQ0NXN5n0Es OpenTelemetry insights: How will Traces and Metrics interact by Ted Young and Josh MacDonald at Lightstep

https://www.youtube.com/watch?v=FbHbDikEUYg Introduction to OpenTelemetry on Kubernetes infrastructure atscale 182 views 11 months ago

https://www.youtube.com/watch?v=1vMu7iskQaY Getting Started with OpenTelemetry - Ted Young, Lightstep Continuous Delivery Foundation 1.5K views 9 months ago

https://www.youtube.com/watch?v=1DxMHqYIvkQ OpenTelemetry Auto-Instrumentation Deep Dive - Carlos Alberto Cortez & Alex Boten, LightStep CNCF [Cloud Native Computing Foundation] 1.1K views 10 months ago

Webinar: How OpenTelemetry is Eating the World (from CNCF) presented by Steve Flanders (Splunk Dir. of Engineering, Otel Collector approver)

Tutorial: OpenTelemetry Java Instrumentation with Spring Boot in under 5 minutes (from Lightstep)

Introduction to Tracing : OpenTelemetry & Opentracing by That DevOps Guy 7.7K views 3 months ago

https://www.youtube.com/watch?v=TmFBDsnLbAY 45:49 Now playing Distributed Tracing with Micronaut Object Computing 1K views 2 years ago

https://www.youtube.com/watch?v=vtuffPM5zXc OpenTelemetry in practice - Ilya Kaznacheev GoLab conference 659 views 7 months ago

https://www.youtube.com/watch?v=88ZjCbT6LPc OpenTelemetry Java Auto-Instrumentation SIG 2020/03/19 OpenTelemetry 264 views 1 year ago

https://www.youtube.com/watch?v=CFLZJSwbYI0 16:21 Now playing Spring Tips: Zipkin and Distributed Tracing SpringDeveloper 24K views 4 years ago

https://www.youtube.com/watch?v=RvCcWltMY7U Spring Boot OpenTracing instrumentation, using Jaeger and Zipkin Pavol Loffay 21K views 4 years ago

https://www.youtube.com/watch?v=mNMw148wpZ4 MicroServices | Distributed Logging & Tracing Byte Programming 7.8K views 1 year ago

What Is OpenTelemetry? New Relic 5.7K views 1 year ago

OpenTelemetry Architecture Overview by John Watson New Relic https://opensource.newrelic.com/projects/open-telemetry Tracer SDK

Microservices and Kubernetes Observability | Metrics, Logs, Tracing, Chaos Experiments by Tech Primers 8.7K views 1 year ago

[4] VIDEO: freecodecamp.org’s OpenTelemetry Course - Understand Software Performance by Code with Ania Kubów (https://github.com/kubowania who began with JavaScript games in 2019)

  1. Run Zipkin from DockerHub:

    docker run --rm -d -p 9411:9411 --name zipkin openzipkin/zipkin


More on Security

This is one of a series on Security in DevSecOps:

  1. Security actions for teamwork and SLSA
  2. DevSecOps

  3. Code Signing on macOS
  4. Transport Layer Security

  5. Git Signing
  6. GitHub Data Security
  7. Encrypt all the things

  8. Azure Security-focus Cloud Onramp
  9. Azure Networking

  10. AWS Onboarding
  11. AWS Security (certification exam)
  12. AWS IAM (Identity and Access Management)
  13. AWS Networking

  14. SIEM (Security Information and Event Management)
  15. Intrusion Detection Systems (Goolge/Palo Alto)
  16. Chaos Engineering

  17. SOC2
  18. FedRAMP
  19. CAIQ (Consensus Assessment Initiative Questionnaire) by cloud vendors

  20. AKeyless cloud vault
  21. Hashicorp Vault
  22. Hashicorp Terraform
  23. OPA (Open Policy Agent)

  24. SonarQube
  25. WebGoat known insecure PHP app and vulnerability scanners
  26. Test for OWASP using ZAP on the Broken Web App

  27. Security certifications
  28. Details about Cyber Security

  29. Quantum Supremecy can break encryption in minutes
  30. Pen Testing
  31. Kali Linux

  32. Threat Modeling
  33. WebGoat (deliberately insecure Java app)