What the, Kubernetes! -- part 1

Greetings, and welcome to the first edition of, What the, Kubernetes!

Today's topics: CVE-2017-1002101, init-containers and YOU!

The context

Upgrading a cluster instance group from v1.7.13 to v1.7.14 introduced me to the first-run attempt at solving the problem outlined in the CVE.

The solution to the vulnerability (for the most part affecting untrusted, multi-tenant clusters) involved forcing all configMap and secret bind-mounts to read-only.

The tool we'll use in this post is Helm 2.7.2.

The Problem

A CI component that was running successfully on v1.7.13 after the upgrade to v1.7.14.

$ kubectl get pod -l app=docker-ci
NAME                         READY     STATUS             RESTARTS   AGE
docker-ci-4050235671-n487p   0/1       CrashLoopBackOff   4          1m

$ kubectl logs docker-ci-4050235671-n487p

[...]
time="2018-03-26T04:57:21Z" level=info msg="containerd successfully booted in 0.016248s" module=containerd
Error starting daemon: Error saving key file: open /etc/docker/.tmp-key.json853378281: read-only file system

Whether or not the problem presents is dependent on the workload. It affects docker and minio but not gitlab-ci-runner. The explanation is simple: it depends on the programmer who wrote the code.

chdir(2): it's a syscall, not a law.

Contrary to kube issue #58720, most temp file writes to one of these mounts are performed by programs during initialization and are gone before initialization is complete.

Regardless, since I've written exactly zero lines of Kubernetes code, I will leave public declaration of opinion on the quality of this fix to others in favor of presenting a solution that can help mitigate the results of this change.

The Solution

initContainer
- Mount a shared emptyDir volume on /etc/docker
- Mount the configMap/secret to an alternate directory (/etc/docker_)
- Copy the contents to the expected/configured location
- exit
runtime container
- Mount the shared emptyDir volume on /etc/docker
- Initialize dockerd normally

One functional loss incurred with this method: the files in the target directory will not be magically updated. A mounted configMap or secret will eventually reflect changes made to the source configMap/secret. Replicating such functionality could be done with a side-car container (a runtime, rather than an init-container) that would monitor the API event bus for changes to the configMap of interest; copying new data to the shared volume when necessary.

The Hiccup

Init-containers aren't new. They have, however, had a rough start. When deploying to a v1.5-1.7 cluster, it is necessary to use the beta.kubernetes.io/init-containers annotation to avoid issue #45627. Post v1.8, it will all be just another part of .spec.template.spec[]Map

What the...

NOTE: if you are uninterested in the why, this subsection can be skipped.

I'm not sure when init-containers entered the codebase. The feature graduated to beta status in kube v1.5 and ostensibly to GA status in v1.6.

beta feature annotation: spec.template.spec.metadata.annotations["beta.kubernetes.io/init-containers"].
GA spec path: spec.templates.spec.initContainers[].

The theory goes as follows:

v1.5: use the beta annotation
v1.6 through v1.7 : GA/beta deprecation phase; either the annotation or spec form are valid
v1.8: full GA; annotation is removed.

The full story: issue #45627

The reality of how init-containers are processed on v1.6-v1.7:

Initial Deployment Received
- Does the deployment define .spec.template.spec.initContainers[]?
- No
  - Does the deployment define beta annotation init-containers?
  - Yes
    - ingest annotation JSON
    - sync data from annotation ingestion to ...spec.initContainers[]
  - No (well, then)
- Yes
  - Does the deployment define beta annotation init-containers?
  - Yes
    - Are the ...spec.initContainers[] sync'd with the beta annotation
    - Yes
      - Capital! Nothing to see here. Carry on!
    - No
      - Hrmph! We know what's really making the wheels turn here!
      - ...spec.initContainers[] dropped into /dev/null.
      - re-synchronize ...spec.initContainers[] to reflect the beta annotation.

Summary: kube v1.6 through v1.7 the spec definition never has primacy with the scheduler except on initial Deployment. When such a manifest is received, kube converts the ....initContainers[] spec structure into a JSON string and stores it as a beta annotation value. On subsequent updates, modifications to ....initContainers[] not only have no effect, but also overwrite ....initContainers[] with the existing (deserialized) annotation structure. The only way around this situation is to only use the annotation form on Deployment update. The API will entice you to ...spec.initContainers[] by deserializing your annotation value to its GA spec location. Be strong! Until v1.8, define init-containers as if you were still on v1.5--pretend ...spec.initContainers[] doesn't exist until then!

The Helm Chart

So we need a method that will allow for gradual cut-over to v1.8 without having to manage separate charts.

The application used for this demonstration is Docker. dockerd is one of those binaries that uses its config directory for pre-initialization scratch space.

Configuration

The fix for the read-only configuration path can be seen towards the bottom of ./values.yaml with the keys for driving our init-container named templates above Env.

NOTE: The following chart files have been pruned for this post. The unpruned version

All indentation is at 2-space increments. Look for any lines in the chart with the term indent for adjustment if you adapt this to a different indentation interval.

values.yaml

  2
  3 Image: docker
  4 ImageTag: &itag "18-dind"
  5
  6 deploymentEnvironment: &env demo
  7
  8 Plug: docker
  9
 10 NodeSelectors:  []
 11
 12 InitCommands:
 13   -
 14     name: config
 15     command: cp /etc/docker_/config.json /etc/docker/
 16
 17
 18 Env:
 19   -
 20     name: DOCKER_HOST
 21     value: localhost:49152
 22   -
 23     name: IMAGE_TAG
 24     value: *itag
 52
 53 # Volumes
 54 Volumes:
 55   -
 56     name: docker-config
 57     configMap:
 58       name: docker-config
 59       items:
 60         -
 61           key: config
 62           path: config.json
 63           mode: 0600
 64   -
 65     name: docker-config-directory
 66     emptyDir: {}
 67
 68
 69
 70 VolumeMounts:
 71   -
 72     name: docker-config
 73     mountPath: /etc/docker_
 74   -
 75     name: docker-config-directory
 76     mountPath: /etc/docker

Q: Dear Stephen: Why are your init-container commands listed in values.yaml?

A: I am glad you asked! As template markup gets thicker, readability decreases. Having critical aspects of a deployment hidden within a tangle of unrelated symbols and formatting has the danger of obscuring what the target workload is. I've been meaning to work out the gotpl incantations to make this happen and this series seemed to be a perfect reason to do it!

./settings/one

  1
  2 {
  3   "log-driver": "gcplogs",
  4   "group": "root",
  5   "iptables": true,
  6   "ip-masq": true
  7 }

Named Templates (doc link)

op.ed time

The following is the meat of the presented solution. It involves Helm. Helm is a templating utility that is working its way towards fulfilling its stated goal of being a package manager for Kubernetes.

Because, um... well.. Kubernetes and uh... Golang, Helm, unsurprisingly, uses Go templates. If its notably inelegant appearance displeases you, well, the large pile of sand is over there. And here is your mallet. And you were born with the other critical piece to that puzzle. Go for it!

For everyone else, without further ado, third party plugins or wrapper scripts, I give you...

The Meat (or salty, smokey-flavored tempeh)

NOTE: the filenames prefixed with an underscore signal to helm that the contents are not Kube manifests.

First up: the InitMethod template. See this if you are unfamiliar with postfix notation (reference to lines 9 and 13).

Within this wee mess we have a thing that, when included in another template, will emit a term, annotation or spec, indicative of the form supported by the target kube cluster.

./templates/_helpers.yaml

  1 {{/* vim: set filetype=sls sw=2 ts=2: */}}
  2
  3
  4 {{- define "InitMethod" -}}
  5   {{- $major := .Capabilities.KubeVersion.Major -}}
  6   {{- $minor_ := ( splitList "+" .Capabilities.KubeVersion.Minor ) -}}
  7   {{- $minor := index $minor_ 0 -}}
  8   {{- if and (lt (int $major) 2) (lt (int $minor) 8) }}
  9     {{- printf "annotation" -}}
 10   {{- else -}}
 11     {{- if (eq (int $major) 1) and (ge (int $minor) 8) }}
 12       {{- printf "spec" -}}
 13     {{- end -}} {{/* else if */}}
 14   {{- end -}} {{/* if */}}
 15 {{- end -}} {{/* define */}}

Once your eyes are able to blur past the template markup, it is quite straightforward:

InitMethod
- what version of Kubernetes are we talking to?
  - less than v1.8: we use the annotation form
  - v1.8 and beyond: use the spec form

NOTE: GKE decided to augment the kube version with a "+". Lines 6-7 are required to deal with this anomaly

./templates/_init-containers.yaml

 1 {{/* vim: set filetype=sls sw=2 ts=2: */}}
  2
  3 {{- define "InitSpec" }}
  4 {{- if eq (include "InitMethod" .) "spec" }}
  5 {{- $env := .Values.Env }}
  6 {{- $volumes := .Values.VolumeMounts }}
  7 {{- $image := ( printf "%s:%s" .Values.Image .Values.ImageTag ) }}
  8 initContainers:
  9 {{- range .Values.initCommands }}
 10   -
 11     name: {{ .name }}
 12     image: {{ $image }}
 13     command: ["/bin/sh", "-c"]
 14     args:
 15       - {{ .command | quote }}
 16     env:
 17 {{ toYaml $env | indent 8 }}
 18     volumeMounts:
 19 {{ toYaml $volumes | indent 8 }}
 20 {{- end }} {{/* range */}}
 21 {{- end }} {{/* if */}}
 22 {{- end }} {{/* define */}}
 23
 24
 25
 26 {{- define "InitAnnotation" }}
 27 {{- if eq (include "InitMethod" .) "annotation" }}
 28 {{- $env := .Values.Env }}
 29 {{- $volumes := .Values.VolumeMounts }}
 30 {{- $image := ( printf "%s:%s" .Values.Image .Values.ImageTag ) }}
 31 pod.beta.kubernetes.io/init-containers: |
 32   [
 33 {{- range $ic_index, $ic := .Values.initCommands }}
 34 {{- if $ic_index }},{{end}}
 35     {
 36       "name": {{ .name | quote }},
 37       "image": {{ $image | quote }},
 38       "command": ["/bin/sh", "-c"],
 39       "args": [ {{ .command | quote }} ],
 40       "env":
 41         [
 42 {{- range $ev_index, $ev := $env }}
 43 {{- if $ev_index}},{{end}}
 44 {{ toJson $ev | indent 12 }}
 45 {{- end }}
 46         ],
 47       "volumeMounts":
 48         [
 49 {{- range $vm_index, $vm := $volumes }}
 50 {{- if $vm_index }},{{end}}
 51 {{ toJson $vm | indent 12 }}
 52 {{- end }}
 53         ]
 54     }
 55 {{- end }}
 56   ]
 57 {{- end }}
 58 {{- end }}
 59
 60 {{- define "InitContainers" }}
 61 {{- if eq ( include "InitMethod" ) "annotation" }}
 62 {{- include "InitAnnotation" }}
 63 {{- end }}
 64 {{- if eq ( include "InitMethod" ) "spec" }}
 65 {{- include "InitSpec" }}
 66 {{- end }}
 67 {{- end }}

See lines 4 & 27 for how InitMethod is called.

The Cheese (or congealed soy paste cheese analog)

InitSpec
InitAnnotation

What follows are the templates that inject the appropriate init-containers definition when included by a Deployment manifest. They can be included as part of a Chart's boilerplate (if one is so inclined) as they do not add to the manifest's structure if no init-container commands are defined to drive them.

Encapsulating the if/else logic within the helper templates allows for a Deployment manifest template to get away with only two template-related statements. Only the form that provides full functionality actually renders anything; therefore, the desire for a chart that is version-agnostic (viz. init-containers)

templates/deployment.yaml

  1 apiVersion: extensions/v1beta1
  2 kind: Deployment
  3 metadata:
  4   name: {{.Values.Plug}}-{{.Values.deploymentEnvironment}}
  5   namespace: {{.Release.Namespace}}
  6   labels:
  7     app: {{.Values.Plug}}
  8     env: {{.Values.deploymentEnvironment}}
  9     imageTag: {{.Values.ImageTag | quote }}
 10     heritage: {{.Release.Service | quote }}
 11     release: {{ .Release.Name | quote }}
 12     chart: {{.Chart.Name}}-{{.Chart.Version}}
 13 spec:
 14   selector:
 15     matchLabels:
 16       app: {{.Values.Plug}}-{{.Values.deploymentEnvironment}}
 17       env: {{.Values.deploymentEnvironment}}
 18       imageTag: {{.Values.ImageTag | quote }}
 19       release: {{ .Release.Name | quote }}
 20   template:
 21     metadata:
 22       labels:
 23         app: {{.Values.Plug}}-{{.Values.deploymentEnvironment}}
 24         env: {{.Values.deploymentEnvironment}}
 25         imageTag: {{.Values.ImageTag | quote }}
 26         release: {{ .Release.Name | quote }}
 27       annotations:
 28         chksum/config: {{ include (print $.Template.BasePath  "/configmap.yaml") . | sha256sum | quote }}
 29 {{- include "InitAnnotation" . | indent 8 }}
 30     spec:
 31 {{- include "InitSpec" . | indent 6 }}
 32 {{- if .Values.NodeSelectors }}
 33       nodeSelector:
 34 {{- toYaml .Values.NodeSelectors | indent 10 }}
 35 {{- end }}
 36       volumes:
 37 {{ toYaml .Values.Volumes | indent 8 }}
 38       containers:
 39         -
 40           name: docker
 41           image: {{.Values.Image}}:{{.Values.ImageTag}}
 42           command:
 43             - /usr/local/bin/dockerd
 44           args:
 45             - --config-file=/etc/docker/config.json
 46             - -H
 47             - 0.0.0.0:49152
 48             - --dns
 49             - 8.8.8.8
 50             - --insecure-registry•
 51             - registry--ci.ci
 52           securityContext:
 53             privileged: true
 54           ports:
 55             -
 56              protocol: TCP
 57              containerPort: 49152
 58           volumeMounts:
 59 {{ toYaml .Values.VolumeMounts | indent 12 }}
 60           env:
 61 {{ toYaml .Values.Env | indent 12 }}

Conclusion

The original target example for this post was to have been demonstrating a Django and Celery application. Once I encountered CVE-2017-1002101 I decided to refocus my initial foray towards the simpler and much more immediate problem domain. This example doesn't have complex requirements for making it work. As long as the init-container image has a functioning cp binary it will foot the bill--a valid argument can be presented that the post-initial-deployment functionality of this init-container has no effect on the long-term viability of that Deployment (as long as the command is entered correctly the first time).

Because python runtimes (e.g.: Django, Celery, Gunicorn) directly consume application code, it is critical those runtimes' environments are always in sync. The next post will cover such a deployment and will include the methods demonstrated today.

Thanks for reading!

Pro-Tip – What the, Kubernetes! -- part 1