10. 10. 2022 Lorenzo Candeago DevOps

My OpenShift Journey #5: Run Unprivileged Containers with systemd in OpenShift: Part 2 – Testing

In my previous blog post, we modified the boot parameters to enable cgroups v2 and the user namespace in CRI-O. In this second part I’ll show you how to run a sample container with systemd and check that the modifications we made actually worked.

Setting up a Test Docker

To test the new config, let’s use a simple Docker with systemd enabled, based on CentOS.

The container can be deployed with a standard manifest:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: systemd-deployment
  labels:
    app: systemd
spec:
  replicas: 1
  selector:
    matchLabels:
      app: systemd
  template:
    metadata:
      labels:
        app: systemd
      annotations:
        io.kubernetes.cri-o.userns-mode: "auto"
    spec:
      serviceAccountName: systemd-test
      automountServiceAccountToken: True
      containers:
        - name: systemd-test
          image:  registry.access.redhat.com/ubi9:latest
          command: ["/sbin/init"]
      securityContext:
        allowPrivilegeEscalation: False
        capabilities:
          drop:
            - ALL
        runAsNonRoot: False

Note that we add the flag runAsNonRoot: False since systemd inside the container is executed as root inside the container (but as mentioned before, it’s mapped to a non-root uid outside of the container namespace), and the annotation io.kubernetes.cri-o.userns-mode: "auto" to enable the CRI-O user namespace. To simplify testing, instead of creating a Docker image with systemd, we picked RedHat’s UBI image and overrode the container’s entry point (command: ["/sbin/init"]) to load the necessary services.

Let’s Verify that the Changes Actually Worked

First let’s get the name of the pod and some information on which node the pod is running:

oc get pods -n default -o wide | grep systemd
systemd-deployment-5698997785-45tzz   1/1     Running   0          4m50s   10.128.2.94   node04   <none>           <none>h

and then log in to the container:

oc -n default rsh systemd-deployment-5bfd6fdb56-2ftq4

and install ps util to check the details about the processes running:

sh-5.1# dnf install -y procps

As we can see, inside of the container we are the root user:

sh-5.1# whoami
root

and we can see that the init process inside the container is running with PID 1 as root:

sh-5.1# ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 15:24 ?        00:00:00 /sbin/init
root          18       1  0 15:24 ?        00:00:00 /usr/lib/systemd/systemd-journald
root          29       0  0 15:24 pts/0    00:00:00 sh
root          57      29  0 15:29 pts/0    00:00:00 ps -ef

and that systemd is running:

sh-5.1# systemctl status
● systemd-deployment-5bfd6fdb56-2ftq4
    State: running
     Jobs: 0 queued
   Failed: 0 units
    Since: Mon 2022-10-03 15:24:41 UTC; 17h ago
   CGroup: /
           ├─init.scope
           │ ├─  1 /sbin/init
           │ ├─ 29 sh
           │ ├─277 /bin/sh
           │ ├─291 systemctl status
           │ └─292 "(pager)"
           └─system.slice
             ├─dbus-broker.service
             │ ├─64 /usr/bin/dbus-broker-launch --scope system --audit
             │ └─65 dbus-broker --log 4 --controller 9 --machine-id 4b9d5f875116426badd8e681f903b8f3 --max-bytes 536870912 --max-fds 4096 --max-matches 16384 --audit
             └─systemd-journald.service
               └─18 /usr/lib/systemd/systemd-journald

Now, let’s check the sandbox in the host system: we want to verify that we are running as an unprivileged user on the host.

So first let’s get the container’s ID in the OpenShift node where the pod is running:

sudo crictl ps | grep systemd
CONTAINER           IMAGE                                                                                                                                                       CREATED             STATE               NAME                                 ATTEMPT             POD ID
8158b7ccd8ae1       registry.access.redhat.com/ubi9@sha256:c40e515aaebf3da366419d4eae3f0a9fe95ef88f4b942b7cf8ce421010e3969c                                                     15 hours ago        Running             systemd-test                         0                   5dafb65aeccb1

and then we’ll inspect the sandbox to check that the user really is unprivileged and to get the PID of the running container:

sudo crictl inspect 8158b7ccd8ae1 | jq '.info.privileged, .info.pid'
false
1837568

Now we can check how the pid and the uid of the process running in the container are mapped to the host’s namespace:

sudo  pgrep --ns 1837568  | xargs ps -o pid,uid,cmd
    PID   UID CMD
1837568 165536 /sbin/init
1837700 165536 /usr/lib/systemd/systemd-journald
1837981 165536 sh
1889217 165617 /usr/bin/dbus-broker-launch --scope system --audit
1889222 165617 dbus-broker --log 4 --controller 9 --machine-id 4b9d5f875116426badd8e681f903b8f
3315216 165536 /bin/sh

As we can see, the processes running in the container are mapped to a non-privileged pid/uid in the host system.

Conclusion

We were able to run a shell as root user within the container namespace, while being a non-privileged user outside of the container. In a future blog post we’ll investigate how to further limit the SCC that we’ve used up to now (anyuid) to a more specific SCC, and investigate how to remove more capabilities.

This is still an experimental approach, but for these initial tests it seems to work: your mileage may vary. And thanks again to Marco Caimi from RedHat for their support.

These Solutions are Engineered by Humans

Did you find this article interesting? Are you an “under the hood” kind of person? We’re really big on automation and we’re always looking for people in a similar vein to fill roles like this one as well as other roles here at Würth Phoenix.