Where Docker images are stored for Kubernetes to run
Overview
A registry of Docker images is crucial for Kubernetes because a Docker Registry supplies Kubernetes with images it uses to instantiate each Docker container. When the Docker Registry goes down, so does Kubernetes.
NOTE: Content here are my personal opinions, and not intended to represent any employer (past or present). “PROTIP:” here highlight information I haven’t seen elsewhere on the internet because it is hard-won, little-know but significant facts based on my personal research and experience.
Selecting a container registry for your Docker environment can sometimes feel like choosing what to eat at a Chinese restaurant that features a hundred items on its menu—The number of choices can be overwhelming, and you may not understand exactly what each option entails. As a result, you end up ordering General Tso’s chicken because it’s the only thing you really recognize. Then you spend the rest of the evening questioning whether you made the right choice and ate an authentic Chinese meal.Chrus Tozzi
Docker Hub
Docker Inc’s on-line Docker Hub (https://hub.docker.com) houses many public Docker images, free to pull.
Docker Inc. also open-sourced its on-premise Docker Registry server even though Docker Inc. also earns money for its on-premise Trusted Docker Registry product.
Public Docker Hub API
Docker Hub has a public API which doesn’t require authentication.
“The Docker Registry HTTP API is the protocol to facilitate distribution of images to the docker engine. It interacts with instances of the docker registry, which is a service to manage information about docker images and enable their distribution.”*
-
List the first 10 tags in the image for Debian (the operating system):
curl 'https://registry.hub.docker.com/v2/repositories/library/debian/tags/'
The response starts with:
{"count": 511, "next": "https://registry.hub.docker.com/v2/repositories/library/debian/tags/?page=2", "previous": null, "results": [{"name": "unstable-slim", "full_size": 27750048, "images": [{"size": 31296808, "architecture": "ppc64le", "variant": null, "features": null, "os": "linux", "os_version": null, "os_features": null}, {"size": 26435103, "architecture": "s390x", "variant": null, "features": null, "os": "linux", "os_version": null, "os_features": null}, {"size": 25698949, "architecture": "arm", "variant": "v5", "features": null ....
-
If we install the jq JSON query utility that formats JSON reponses on a Mac:
brew install jq
-
Add to the previous command piping to jq:
curl -s 'https://registry.hub.docker.com/v2/repositories/library/debian/tags/'|jq '."results"[]["name"]'
-s silences downloading statistics.
Sample result (for Debian) puts in line breaks:
"unstable-slim" "unstable-20191014-slim" "unstable-20191014" "unstable" "testing-slim" "testing-backports" "testing-20191014-slim" "testing-20191014" "testing" "stretch-slim"
Vulnerability Identification Services
Look among the Docker images.
Although many images are “Docker Certified”, what does that mean?
Several other organizations provide a service for “deep scanning” of Docker images:
-
X-Ray from JFrog (which also makes the binary repository Artifactory)
-
Sonatype from Nexus which also makes a binary repository of the same name
-
Black Hat
Private Online Registries
If you want to keep your Docker image private or want security vetting of images for vulnerabilities, you would have to pay (see Enterprise Docker).
There are several other Docker Registry services:
-
Quay.io (pronounced “key”) which RedHat provides.
-
Artifactory licensed
Private On-premises Docker Registry
Docker Inc. has open-sourced their Docker Hub server software at https://github.com/docker/distribution/tree/master/registry.
Looking among the files in the root of the repo, notice the server is written in the Go language.
Private Docker Registry Server Install
As one would expect, Docker Registry is installed within a Docker container. For install instructions, see https://docs.docker.com/registry/deploying
-
This command can be used to start the Registry as a single container:
docker run -d \ --restart=always \ --name ...-registry \ -v "$(pwd)"/certs:/certs \ -e REGISTRY_HTTP_ADDR=0.0.0.0:443 \ -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/domain.crt \ -e REGISTRY_HTTP_TLS_KEY=/certs/domain.key \ -p 443:443 \ registry:2
-
Generate TLS certificates and place them in the path described in the command.
However, you’ll likely start the on-prem. Docker Registry using a docker compose command so that several containers can be brought up as a Registry service for use by Kubernetes.
The other container handles Authentication using the OAuth protocol.
PROTIP: Docker Registry from Docker Inc. does not have a UI. It is not designed to operate in a cluster (for High Availability). It has no built-in authentication.
About server install:
QUESTION: How to automatically pull images from Docker Hub if not in the private registry?
Docker CLI client install
For install instructions, see https://docs.docker.com/registry/deploying
-
List the Docker packages available for your Mac:
brew search docker
-
Get the Docker client on your Mac:
brew info docker
-
Get the Docker GUI client on your Mac:
brew install --cask docker
Private Registry
https://github.com/docker/distribution/blob/master/docs/spec/auth/token.md
Version 2 uses an industry-standard OAuth2 process. The example below is for an account/image “samalba/my-app”.
In a Bash script at https://gist.github.com/alexanderilyin/8cf68f85b922a7f1757ae3a74640d48a
- token="$(curl https://auth.docker.io/token?service=registry.docker.io&scope=repository:library/ubuntu:pull | jq -r '.token')"
A more detailed explanation:
- Registry client issues a GET request to the authorization service for a Bearer token.
https://auth.docker.io/token?service=registry.docker.io&scope=repository:samalba/my-app:pull,push
-
Authorization service returns an opaque Bearer token representing the client’s authorized access. Example:
www-Authenticate: Bearer realm=”https://auth.docker.io/token”,service=”registry.docker.io”,scope=”repository:samalba/my-app:pull,push”
-
Client captures the token returned:
HTTP/1.1 200 OK Content-Type: application/json {"token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzI1NiIsImtpZCI6IlBZWU86VEVXVTpWN0pIOjI2SlY6QVFUWjpMSkMzOlNYVko6WEdIQTozNEYyOjJMQVE6WlJNSzpaN1E2In0.eyJpc3MiOiJhdXRoLmRvY2tlci5jb20iLCJzdWIiOiJqbGhhd24iLCJhdWQiOiJyZWdpc3RyeS5kb2NrZXIuY29tIiwiZXhwIjoxNDE1Mzg3MzE1LCJuYmYiOjE0MTUzODcwMTUsImlhdCI6MTQxNTM4NzAxNSwianRpIjoidFlKQ08xYzZjbnl5N2tBbjBjN3JLUGdiVjFIMWJGd3MiLCJhY2Nlc3MiOlt7InR5cGUiOiJyZXBvc2l0b3J5IiwibmFtZSI6InNhbWFsYmEvbXktYXBwIiwiYWN0aW9ucyI6WyJwdXNoIl19XX0.QhflHPfbd6eVF4lM9bwYpFZIV0PfikbyXuLx959ykRTBpe3CYnzs6YBK8FToVb5R47920PVLrh8zuLzdCr9t3w", "expires_in": 3600,"issued_at": "2009-11-10T23:00:00Z"} Using the Bearer token
-
Client retries the original request with the Bearer token embedded in the request’s Authorization HTTP header.
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzI1NiIsImtpZCI6IkJWM0Q6MkFWWjpVQjVaOktJQVA6SU5QTDo1RU42Ok40SjQ6Nk1XTzpEUktFOkJWUUs6M0ZKTDpQT1RMIn0.eyJpc3MiOiJhdXRoLmRvY2tlci5jb20iLCJzdWIiOiJCQ0NZOk9VNlo6UUVKNTpXTjJDOjJBVkM6WTdZRDpBM0xZOjQ1VVc6NE9HRDpLQUxMOkNOSjU6NUlVTCIsImF1ZCI6InJlZ2lzdHJ5LmRvY2tlci5jb20iLCJleHAiOjE0MTUzODczMTUsIm5iZiI6MTQxNTM4NzAxNSwiaWF0IjoxNDE1Mzg3MDE1LCJqdGkiOiJ0WUpDTzFjNmNueXk3a0FuMGM3cktQZ2JWMUgxYkZ3cyIsInNjb3BlIjoiamxoYXduOnJlcG9zaXRvcnk6c2FtYWxiYS9teS1hcHA6cHVzaCxwdWxsIGpsaGF3bjpuYW1lc3BhY2U6c2FtYWxiYTpwdWxsIn0.Y3zZSwaZPqy4y9oRBVRImZyv3m_S9XDHF1tWwN7mL52C_IiA73SJkWVNsvNqpJIn5h7A2F8biv_S2ppQ1lgkbw
-
Registry authorizes the client by validating the Bearer token and the claim set embedded within it and begins the push/pull session as usual.
Internal data structure
No audio in this animated diagram video:
The local Docker Registry is usually installed as a registry folder under /var/lib. But administrators mount “/data” on a separate device so that if it fills up it won’t down the server.
That’s the folder taking up disk space, as measured by the du -s command.
The full path to the name of each Docker image is under a version 2, when removal was first enabled.
stored in the registry is defined in a folder under the repositories side of the folder tree. Some images are ground under an account name.
All content in repositories are stored as blobs under the “blobs” path. Under that is a sha256 (pronounced “shaw 256”) folder. S-H-A is an acronymn for the “Secure Hash Algorithm” defined by the US National Security Agency. Hashing creates a sort of summary of a file’s content. That’s why hashes are also called a “digests”. Digests are always a fixed number of digits. In the case of 256, 46 characters of hex pairs. The first two characters are used to create a folder under where that file is stored, so that there are not too many files in a single folder.
The first 7 digests of each digest are used as a short tag because 7 is usually unique enough to differentiate among the various hashes.
The blobs for each tag is referenced by link files containing addresses to individual data files storing the content. Each revision of an image pushed in the registry has a different SHA and therefore a different tab name.
The current link defines the most recent blob at the head of the chain.
These are all under a manifest folder. A manifest API call returns a manifest listing the different layers containing changes stored as data blobs. Each layer of data within an image can be referenced by several commits into the registry. Just as with Git, this data architecture is how one can fall back to the complete set of files that existed when each push is made into an image repository.
There is nothing under the layers and uploads folders.
In each link file, the SHA defined for each layer is the address of a blob. This is how changes do not bloat the repository disk space like full copies of files with minor revisions.
Any blob can be accessed by any Docker image.
What’s wrong with this picture?
Some tags for images are obsoleted over time when vulnerabilities are found and patched.
The “content addressable” data architecture of the Docker Registry is borrowed from the Git repository structure. That design is for keeping even obsoleted source code forever.
PROTIP: Although it’s convenient for Git users to fall back on various versions, that feature can actually be a security flaw for Docker images. If someone falls back to a previoius version, it may contain vulnerabilities which have been fixed in the latest version. So falling back can re-introduce an exploit.
Also, obsolete data remaining in the Docker Registry means it will grow and grow in an unsustainable way.
Removal is complicated
PROTIP: When a particular tag is removed using the API or directly, that does not directly result in much disk space being freed up as deleting a regular file might do.
PROTIP: The docker image rm command removes entire images, not individual tags.
Removing an image does not release hard disk space until a garbage collection operation occurs. A Docker Garbage Collection program needs to first mark every blob referenced in a link, then go back and remove blobs with no reference to it.
Even after that, if disk space were physically allocated incrementally over time, blob files may still populate each extent. That means to recover physical disk space would require copying all extents to another instance with continguous allocation of disk space?
One impediment is that removal of obsolete files would occur only after whatever latest replacing it is known good.
No archival is needed if the image can be easily rebuilt.
Remove image programs
There are several approaches to remove tags:
A. Use the “docker image rm “ call to remove all tags in an image using a single call.
This needs to be done as “docker exec -it …”.
B. Make API DELETE call:
curl -v -X DELETE http://registryhost:reigstryport/v2/${docker_image_name}/manifests/${digest}
PROTIP: The DELETE API does not remove revisions.
C. Physically remove links pointing to blobs using Bash rm commands to remove revisions along with tags.
Either way, garbage collection is necessary to remove blobs.
Registry garbage-collect does not clean up old blobs if a tag has been overwritten but has not been deleted - https://github.com/docker/distribution/issues/2212
Discussions on StackOverflow:
-
https://stackoverflow.com/questions/45046752/docker-registry-garbage-collection/45047696
-
https://stackoverflow.com/questions/25436742/how-to-delete-images-from-a-private-docker-registry
Python scripts:
-
https://github.com/andrey-pohilko/registry-cli A Python script that deletes docker images:
-
https://github.com/ricardobranco777/clean_registry ricardobranco777’s registry clean up python script
-
https://beta.docs.docker.com/engine/reference/commandline/registry_rmi/ docker registry rmi REPOSITORY:TAG [OPTIONS]
-
https://www.linuxtechi.com/setup-docker-private-registry-centos-7-rhel-7/
-
https://www.server-world.info/en/note?os=CentOS_7&p=docker&f=6
-
https://gist.github.com/qoomon/7c7f16939630cafafceeb83d254194e4
Client
If you have a lot of images, avoid timeouts by configuring your terminal:
Host * ServerAliveInterval 300 ServerAliveCountMax 2
JFrog Artifactory as Docker Registry
https://jfrog.com/screencast/artifactory-5-one-minute-setup-docker-registry-as-container-install/
References
An alternative to DockerHub is GitHub Packages operated by GitHub
More on DevOps
This is one of a series on DevOps:
- DevOps_2.0
- ci-cd (Continuous Integration and Continuous Delivery)
- User Stories for DevOps
- Git and GitHub vs File Archival
- Git Commands and Statuses
- Git Commit, Tag, Push
- Git Utilities
- Data Security GitHub
- GitHub API
- Choices for DevOps Technologies
- Pulumi Infrastructure as Code (IaC)
- Java DevOps Workflow
- AWS DevOps (CodeCommit, CodePipeline, CodeDeploy)
- AWS server deployment options
- Cloud services comparisons (across vendors)
- Cloud regions (across vendors)
- Azure Cloud Onramp (Subscriptions, Portal GUI, CLI)
- Azure Certifications
- Azure Cloud Powershell
- Bash Windows using Microsoft’s WSL (Windows Subsystem for Linux)
- Azure Networking
- Azure Storage
- Azure Compute
- Digital Ocean
- Packer automation to build Vagrant images
- Terraform multi-cloud provisioning automation
-
Hashicorp Vault and Consul to generate and hold secrets
- Powershell Ecosystem
- Powershell on MacOS
- Jenkins Server Setup
- Jenkins Plug-ins
- Jenkins Freestyle jobs
- Docker (Glossary, Ecosystem, Certification)
- Make Makefile for Docker
- Docker Setup and run Bash shell script
- Bash coding
- Docker Setup
- Dockerize apps
- Ansible
- Kubernetes Operators
- Threat Modeling
- API Management Microsoft
- Scenarios for load
- Chaos Engineering