In my previous blog post, we modified the boot parameters to enable cgroups v2 and the user namespace in CRI-O. In this second part I’ll show you how to run a sample container with systemd and check that the modifications we made actually worked.
To test the new config, let’s use a simple Docker with systemd enabled, based on CentOS.
The container can be deployed with a standard manifest:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: systemd-deployment
labels:
app: systemd
spec:
replicas: 1
selector:
matchLabels:
app: systemd
template:
metadata:
labels:
app: systemd
annotations:
io.kubernetes.cri-o.userns-mode: "auto"
spec:
serviceAccountName: systemd-test
automountServiceAccountToken: True
containers:
- name: systemd-test
image: registry.access.redhat.com/ubi9:latest
command: ["/sbin/init"]
securityContext:
allowPrivilegeEscalation: False
capabilities:
drop:
- ALL
runAsNonRoot: False
Note that we add the flag runAsNonRoot: False
since systemd inside the container is executed as root inside the container (but as mentioned before, it’s mapped to a non-root uid outside of the container namespace), and the annotation io.kubernetes.cri-o.userns-mode: "auto"
to enable the CRI-O user namespace. To simplify testing, instead of creating a Docker image with systemd, we picked RedHat’s UBI image and overrode the container’s entry point (command: ["/sbin/init"]
) to load the necessary services.
First let’s get the name of the pod and some information on which node the pod is running:
oc get pods -n default -o wide | grep systemd
systemd-deployment-5698997785-45tzz 1/1 Running 0 4m50s 10.128.2.94 node04 <none> <none>h
and then log in to the container:
oc -n default rsh systemd-deployment-5bfd6fdb56-2ftq4
and install ps
util to check the details about the processes running:
sh-5.1# dnf install -y procps
As we can see, inside of the container we are the root user:
sh-5.1# whoami
root
and we can see that the init process inside the container is running with PID 1 as root:
sh-5.1# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 15:24 ? 00:00:00 /sbin/init
root 18 1 0 15:24 ? 00:00:00 /usr/lib/systemd/systemd-journald
root 29 0 0 15:24 pts/0 00:00:00 sh
root 57 29 0 15:29 pts/0 00:00:00 ps -ef
and that systemd is running:
sh-5.1# systemctl status
● systemd-deployment-5bfd6fdb56-2ftq4
State: running
Jobs: 0 queued
Failed: 0 units
Since: Mon 2022-10-03 15:24:41 UTC; 17h ago
CGroup: /
├─init.scope
│ ├─ 1 /sbin/init
│ ├─ 29 sh
│ ├─277 /bin/sh
│ ├─291 systemctl status
│ └─292 "(pager)"
└─system.slice
├─dbus-broker.service
│ ├─64 /usr/bin/dbus-broker-launch --scope system --audit
│ └─65 dbus-broker --log 4 --controller 9 --machine-id 4b9d5f875116426badd8e681f903b8f3 --max-bytes 536870912 --max-fds 4096 --max-matches 16384 --audit
└─systemd-journald.service
└─18 /usr/lib/systemd/systemd-journald
Now, let’s check the sandbox in the host system: we want to verify that we are running as an unprivileged user on the host.
So first let’s get the container’s ID in the OpenShift node where the pod is running:
sudo crictl ps | grep systemd
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
8158b7ccd8ae1 registry.access.redhat.com/ubi9@sha256:c40e515aaebf3da366419d4eae3f0a9fe95ef88f4b942b7cf8ce421010e3969c 15 hours ago Running systemd-test 0 5dafb65aeccb1
and then we’ll inspect the sandbox to check that the user really is unprivileged and to get the PID of the running container:
sudo crictl inspect 8158b7ccd8ae1 | jq '.info.privileged, .info.pid'
false
1837568
Now we can check how the pid and the uid of the process running in the container are mapped to the host’s namespace:
sudo pgrep --ns 1837568 | xargs ps -o pid,uid,cmd
PID UID CMD
1837568 165536 /sbin/init
1837700 165536 /usr/lib/systemd/systemd-journald
1837981 165536 sh
1889217 165617 /usr/bin/dbus-broker-launch --scope system --audit
1889222 165617 dbus-broker --log 4 --controller 9 --machine-id 4b9d5f875116426badd8e681f903b8f
3315216 165536 /bin/sh
As we can see, the processes running in the container are mapped to a non-privileged pid/uid in the host system.
We were able to run a shell as root user within the container namespace, while being a non-privileged user outside of the container. In a future blog post we’ll investigate how to further limit the SCC that we’ve used up to now (anyuid
) to a more specific SCC, and investigate how to remove more capabilities.
This is still an experimental approach, but for these initial tests it seems to work: your mileage may vary. And thanks again to Marco Caimi from RedHat for their support.
Did you find this article interesting? Are you an “under the hood” kind of person? We’re really big on automation and we’re always looking for people in a similar vein to fill roles like this one as well as other roles here at Würth Phoenix.