Wilson Mar bio photo

Wilson Mar

Hello!

Calendar YouTube Github

LinkedIn

Where Docker images are stored for Kubernetes to run

US (English)   Norsk (Norwegian)   Español (Spanish)   Français (French)   Deutsch (German)   Italiano   Português   Estonian   اَلْعَرَبِيَّةُ (Egypt Arabic)   Napali   中文 (简体) Chinese (Simplified)   日本語 Japanese   한국어 Korean

Overview

A registry of Docker images is crucial for Kubernetes because a Docker Registry supplies Kubernetes with images it uses to instantiate each Docker container. When the Docker Registry goes down, so does Kubernetes.

NOTE: Content here are my personal opinions, and not intended to represent any employer (past or present). “PROTIP:” here highlight information I haven’t seen elsewhere on the internet because it is hard-won, little-know but significant facts based on my personal research and experience.

Selecting a container registry for your Docker environment can sometimes feel like choosing what to eat at a Chinese restaurant that features a hundred items on its menu—The number of choices can be overwhelming, and you may not understand exactly what each option entails. As a result, you end up ordering General Tso’s chicken because it’s the only thing you really recognize. Then you spend the rest of the evening questioning whether you made the right choice and ate an authentic Chinese meal.Chrus Tozzi

Docker Hub

Docker Inc’s on-line Docker Hub (https://hub.docker.com) houses many public Docker images, free to pull.

Docker Inc. also open-sourced its on-premise Docker Registry server even though Docker Inc. also earns money for its on-premise Trusted Docker Registry product.

Public Docker Hub API

Docker Hub has a public API which doesn’t require authentication.

“The Docker Registry HTTP API is the protocol to facilitate distribution of images to the docker engine. It interacts with instances of the docker registry, which is a service to manage information about docker images and enable their distribution.”*

  1. List the first 10 tags in the image for Debian (the operating system):

    curl 'https://registry.hub.docker.com/v2/repositories/library/debian/tags/'
    

    The response starts with:

    {"count": 511, "next": "https://registry.hub.docker.com/v2/repositories/library/debian/tags/?page=2", "previous": null, "results": [{"name": "unstable-slim", "full_size": 27750048, "images": [{"size": 31296808, "architecture": "ppc64le", "variant": null, "features": null, "os": "linux", "os_version": null, "os_features": null}, {"size": 26435103, "architecture": "s390x", "variant": null, "features": null, "os": "linux", "os_version": null, "os_features": null}, {"size": 25698949, "architecture": "arm", "variant": "v5", "features": null ....
    
  2. If we install the jq JSON query utility that formats JSON reponses on a Mac:

    brew install jq
  3. Add to the previous command piping to jq:

    curl -s 'https://registry.hub.docker.com/v2/repositories/library/debian/tags/'|jq '."results"[]["name"]'
    

    -s silences downloading statistics.

    Sample result (for Debian) puts in line breaks:

    "unstable-slim"
    "unstable-20191014-slim"
    "unstable-20191014"
    "unstable"
    "testing-slim"
    "testing-backports"
    "testing-20191014-slim"
    "testing-20191014"
    "testing"
    "stretch-slim"
    

Vulnerability Identification Services

Look among the Docker images.

Although many images are “Docker Certified”, what does that mean?

Several other organizations provide a service for “deep scanning” of Docker images:

  • X-Ray from JFrog (which also makes the binary repository Artifactory)

  • Sonatype from Nexus which also makes a binary repository of the same name

  • Black Hat

  • WhiteSource

Private Online Registries

If you want to keep your Docker image private or want security vetting of images for vulnerabilities, you would have to pay (see Enterprise Docker).

There are several other Docker Registry services:

  • Quay.io (pronounced “key”) which RedHat provides.

  • Artifactory licensed

Private On-premises Docker Registry

Docker Inc. has open-sourced their Docker Hub server software at https://github.com/docker/distribution/tree/master/registry.

Looking among the files in the root of the repo, notice the server is written in the Go language.

Private Docker Registry Server Install

As one would expect, Docker Registry is installed within a Docker container. For install instructions, see https://docs.docker.com/registry/deploying

  1. This command can be used to start the Registry as a single container:

    docker run -d \
      --restart=always \
      --name ...-registry \
      -v "$(pwd)"/certs:/certs \
      -e REGISTRY_HTTP_ADDR=0.0.0.0:443 \
      -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/domain.crt \
      -e REGISTRY_HTTP_TLS_KEY=/certs/domain.key \
      -p 443:443 \
      registry:2
    
  2. Generate TLS certificates and place them in the path described in the command.

However, you’ll likely start the on-prem. Docker Registry using a docker compose command so that several containers can be brought up as a Registry service for use by Kubernetes.

The other container handles Authentication using the OAuth protocol.

PROTIP: Docker Registry from Docker Inc. does not have a UI. It is not designed to operate in a cluster (for High Availability). It has no built-in authentication.

About server install:

QUESTION: How to automatically pull images from Docker Hub if not in the private registry?

Docker CLI client install

For install instructions, see https://docs.docker.com/registry/deploying

  1. List the Docker packages available for your Mac:

    brew search docker
    
  2. Get the Docker client on your Mac:

    brew info docker
    
  3. Get the Docker GUI client on your Mac:

    brew install --cask docker
    

Private Registry

https://github.com/docker/distribution/blob/master/docs/spec/auth/token.md

Version 2 uses an industry-standard OAuth2 process. The example below is for an account/image “samalba/my-app”.

In a Bash script at https://gist.github.com/alexanderilyin/8cf68f85b922a7f1757ae3a74640d48a

    token="$(curl https://auth.docker.io/token?service=registry.docker.io&scope=repository:library/ubuntu:pull | jq -r '.token')"

A more detailed explanation:

  1. Registry client issues a GET request to the authorization service for a Bearer token.

https://auth.docker.io/token?service=registry.docker.io&scope=repository:samalba/my-app:pull,push

  1. Authorization service returns an opaque Bearer token representing the client’s authorized access. Example:

    www-Authenticate: Bearer realm=”https://auth.docker.io/token”,service=”registry.docker.io”,scope=”repository:samalba/my-app:pull,push”

  2. Client captures the token returned:

    HTTP/1.1 200 OK
    Content-Type: application/json
    {"token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzI1NiIsImtpZCI6IlBZWU86VEVXVTpWN0pIOjI2SlY6QVFUWjpMSkMzOlNYVko6WEdIQTozNEYyOjJMQVE6WlJNSzpaN1E2In0.eyJpc3MiOiJhdXRoLmRvY2tlci5jb20iLCJzdWIiOiJqbGhhd24iLCJhdWQiOiJyZWdpc3RyeS5kb2NrZXIuY29tIiwiZXhwIjoxNDE1Mzg3MzE1LCJuYmYiOjE0MTUzODcwMTUsImlhdCI6MTQxNTM4NzAxNSwianRpIjoidFlKQ08xYzZjbnl5N2tBbjBjN3JLUGdiVjFIMWJGd3MiLCJhY2Nlc3MiOlt7InR5cGUiOiJyZXBvc2l0b3J5IiwibmFtZSI6InNhbWFsYmEvbXktYXBwIiwiYWN0aW9ucyI6WyJwdXNoIl19XX0.QhflHPfbd6eVF4lM9bwYpFZIV0PfikbyXuLx959ykRTBpe3CYnzs6YBK8FToVb5R47920PVLrh8zuLzdCr9t3w", "expires_in": 3600,"issued_at": "2009-11-10T23:00:00Z"}
    Using the Bearer token
    
  3. Client retries the original request with the Bearer token embedded in the request’s Authorization HTTP header.

    Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzI1NiIsImtpZCI6IkJWM0Q6MkFWWjpVQjVaOktJQVA6SU5QTDo1RU42Ok40SjQ6Nk1XTzpEUktFOkJWUUs6M0ZKTDpQT1RMIn0.eyJpc3MiOiJhdXRoLmRvY2tlci5jb20iLCJzdWIiOiJCQ0NZOk9VNlo6UUVKNTpXTjJDOjJBVkM6WTdZRDpBM0xZOjQ1VVc6NE9HRDpLQUxMOkNOSjU6NUlVTCIsImF1ZCI6InJlZ2lzdHJ5LmRvY2tlci5jb20iLCJleHAiOjE0MTUzODczMTUsIm5iZiI6MTQxNTM4NzAxNSwiaWF0IjoxNDE1Mzg3MDE1LCJqdGkiOiJ0WUpDTzFjNmNueXk3a0FuMGM3cktQZ2JWMUgxYkZ3cyIsInNjb3BlIjoiamxoYXduOnJlcG9zaXRvcnk6c2FtYWxiYS9teS1hcHA6cHVzaCxwdWxsIGpsaGF3bjpuYW1lc3BhY2U6c2FtYWxiYTpwdWxsIn0.Y3zZSwaZPqy4y9oRBVRImZyv3m_S9XDHF1tWwN7mL52C_IiA73SJkWVNsvNqpJIn5h7A2F8biv_S2ppQ1lgkbw
    
  4. Registry authorizes the client by validating the Bearer token and the claim set embedded within it and begins the push/pull session as usual.

Internal data structure

No audio in this animated diagram video:


The local Docker Registry is usually installed as a registry folder under /var/lib. But administrators mount “/data” on a separate device so that if it fills up it won’t down the server.

That’s the folder taking up disk space, as measured by the du -s command.

The full path to the name of each Docker image is under a version 2, when removal was first enabled.

stored in the registry is defined in a folder under the repositories side of the folder tree. Some images are ground under an account name.

All content in repositories are stored as blobs under the “blobs” path. Under that is a sha256 (pronounced “shaw 256”) folder. S-H-A is an acronymn for the “Secure Hash Algorithm” defined by the US National Security Agency. Hashing creates a sort of summary of a file’s content. That’s why hashes are also called a “digests”. Digests are always a fixed number of digits. In the case of 256, 46 characters of hex pairs. The first two characters are used to create a folder under where that file is stored, so that there are not too many files in a single folder.

The first 7 digests of each digest are used as a short tag because 7 is usually unique enough to differentiate among the various hashes.

The blobs for each tag is referenced by link files containing addresses to individual data files storing the content. Each revision of an image pushed in the registry has a different SHA and therefore a different tab name.

The current link defines the most recent blob at the head of the chain.

These are all under a manifest folder. A manifest API call returns a manifest listing the different layers containing changes stored as data blobs. Each layer of data within an image can be referenced by several commits into the registry. Just as with Git, this data architecture is how one can fall back to the complete set of files that existed when each push is made into an image repository.

There is nothing under the layers and uploads folders.

In each link file, the SHA defined for each layer is the address of a blob. This is how changes do not bloat the repository disk space like full copies of files with minor revisions.

Any blob can be accessed by any Docker image.

dockerreg-structure-v03-1727x947.png

What’s wrong with this picture?

Some tags for images are obsoleted over time when vulnerabilities are found and patched.

The “content addressable” data architecture of the Docker Registry is borrowed from the Git repository structure. That design is for keeping even obsoleted source code forever.

PROTIP: Although it’s convenient for Git users to fall back on various versions, that feature can actually be a security flaw for Docker images. If someone falls back to a previoius version, it may contain vulnerabilities which have been fixed in the latest version. So falling back can re-introduce an exploit.

Also, obsolete data remaining in the Docker Registry means it will grow and grow in an unsustainable way.

Removal is complicated

PROTIP: When a particular tag is removed using the API or directly, that does not directly result in much disk space being freed up as deleting a regular file might do.

PROTIP: The docker image rm command removes entire images, not individual tags.

Removing an image does not release hard disk space until a garbage collection operation occurs. A Docker Garbage Collection program needs to first mark every blob referenced in a link, then go back and remove blobs with no reference to it.

Even after that, if disk space were physically allocated incrementally over time, blob files may still populate each extent. That means to recover physical disk space would require copying all extents to another instance with continguous allocation of disk space?

One impediment is that removal of obsolete files would occur only after whatever latest replacing it is known good.

No archival is needed if the image can be easily rebuilt.



Remove image programs

There are several approaches to remove tags:

A. Use the “docker image rm “ call to remove all tags in an image using a single call.

This needs to be done as “docker exec -it …”.

B. Make API DELETE call:

curl -v -X DELETE http://registryhost:reigstryport/v2/${docker_image_name}/manifests/${digest}

PROTIP: The DELETE API does not remove revisions.

C. Physically remove links pointing to blobs using Bash rm commands to remove revisions along with tags.

Either way, garbage collection is necessary to remove blobs.

Registry garbage-collect does not clean up old blobs if a tag has been overwritten but has not been deleted - https://github.com/docker/distribution/issues/2212

Discussions on StackOverflow:

  • https://stackoverflow.com/questions/45046752/docker-registry-garbage-collection/45047696

  • https://stackoverflow.com/questions/25436742/how-to-delete-images-from-a-private-docker-registry

Python scripts:

  • https://github.com/andrey-pohilko/registry-cli A Python script that deletes docker images:

  • https://github.com/ricardobranco777/clean_registry ricardobranco777’s registry clean up python script

  • https://beta.docs.docker.com/engine/reference/commandline/registry_rmi/ docker registry rmi REPOSITORY:TAG [OPTIONS]

  • https://www.linuxtechi.com/setup-docker-private-registry-centos-7-rhel-7/

  • https://www.server-world.info/en/note?os=CentOS_7&p=docker&f=6

  • https://gist.github.com/qoomon/7c7f16939630cafafceeb83d254194e4

Client

If you have a lot of images, avoid timeouts by configuring your terminal:

Host *
   ServerAliveInterval 300
   ServerAliveCountMax 2
   

JFrog Artifactory as Docker Registry

https://jfrog.com/screencast/artifactory-5-one-minute-setup-docker-registry-as-container-install/


References

An alternative to DockerHub is GitHub Packages operated by GitHub

More on DevOps

This is one of a series on DevOps:

  1. DevOps_2.0
  2. ci-cd (Continuous Integration and Continuous Delivery)
  3. User Stories for DevOps
  4. Enterprise Software)

  5. Git and GitHub vs File Archival
  6. Git Commands and Statuses
  7. Git Commit, Tag, Push
  8. Git Utilities
  9. Data Security GitHub
  10. GitHub API
  11. TFS vs. GitHub

  12. Choices for DevOps Technologies
  13. Pulumi Infrastructure as Code (IaC)
  14. Java DevOps Workflow
  15. Okta for SSO & MFA

  16. AWS DevOps (CodeCommit, CodePipeline, CodeDeploy)
  17. AWS server deployment options
  18. AWS Load Balancers

  19. Cloud services comparisons (across vendors)
  20. Cloud regions (across vendors)
  21. AWS Virtual Private Cloud

  22. Azure Cloud Onramp (Subscriptions, Portal GUI, CLI)
  23. Azure Certifications
  24. Azure Cloud

  25. Azure Cloud Powershell
  26. Bash Windows using Microsoft’s WSL (Windows Subsystem for Linux)
  27. Azure KSQL (Kusto Query Language) for Azure Monitor, etc.

  28. Azure Networking
  29. Azure Storage
  30. Azure Compute
  31. Azure Monitoring

  32. Digital Ocean
  33. Cloud Foundry

  34. Packer automation to build Vagrant images
  35. Terraform multi-cloud provisioning automation
  36. Hashicorp Vault and Consul to generate and hold secrets

  37. Powershell Ecosystem
  38. Powershell on MacOS
  39. Powershell Desired System Configuration

  40. Jenkins Server Setup
  41. Jenkins Plug-ins
  42. Jenkins Freestyle jobs
  43. Jenkins2 Pipeline jobs using Groovy code in Jenkinsfile

  44. Docker (Glossary, Ecosystem, Certification)
  45. Make Makefile for Docker
  46. Docker Setup and run Bash shell script
  47. Bash coding
  48. Docker Setup
  49. Dockerize apps
  50. Docker Registry

  51. Maven on MacOSX

  52. Ansible
  53. Kubernetes Operators
  54. OPA (Open Policy Agent) in Rego language

  55. MySQL Setup

  56. Threat Modeling
  57. SonarQube & SonarSource static code scan

  58. API Management Microsoft
  59. API Management Amazon

  60. Scenarios for load
  61. Chaos Engineering