AllowPrivilegeEscalation

In this demo we are going to show how privilege escalation can be managed through the use of no_new_privs in order to avoid use of SUID binaries and binaries with file capabilities.

Hands-on Demo 1

NOTE: Below tests were run in a Fedora 38 machine with Podman v4.6.2, results may vary when using other O.S / Podman versions.

In this demo we will see how no_new_privs can be used to avoid SUID binaries.

Create a small container image that has the whoami binary configured with SETUID bit

cat <<EOF > /tmp/whoami-setuid.dockerfile
FROM fedora:38
RUN chmod +s /usr/bin/whoami
ENTRYPOINT /usr/bin/whoami
EOF

podman build -f /tmp/whoami-setuid.dockerfile -t whoami-setuid

If we run the image as user 1024 without setting the no_new_privs bit this is what we get
```
podman run -it --rm --user=1024 whoami-setuid
```
NOTE: As you can see, the privilege escalation happened.
```
root
```
If we run the image as user 1024 with the no_new_privs bit set
```
podman run -it --rm --user=1024 --security-opt=no-new-privileges whoami-setuid
```
NOTE: In this case, the privilege escalation was blocked.
```
1024
```

Hands-on Demo 2

In this demo we will see how no_new_privs can be used to avoid users to use binaries with file capabilities if those capabilities are not in their permitted and effective thread's capability set.

Run the container with user 1024 and without setting the no_new_privs bit

NOTE: The image we are using for our test has a small web service with the NET_BIND_SERVICE file capability configured in the binary.
```
podman run --rm -it --entrypoint /bin/bash --user 1024 -e APP_PORT=80 --name reversewords-test quay.io/mavazque/reversewords-captest:latest
```

Get the file capabilities for the web service binary

getcap /usr/bin/reverse-words

/usr/bin/reverse-words = cap_net_bind_service+eip

Get the container's thread capabilities

grep Cap /proc/1/status

CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 00000000800405fb
CapAmb: 0000000000000000

Decode thread capabilities

capsh --decode=00000000800405fb

0x00000000800405fb=cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_sys_chroot,cap_setfcap

Execute the binary

/usr/bin/reverse-words

NOTE: As expected, the binary was able to get the NET_BIND_SERVICE capability

2023/10/17 07:26:06 Starting Reverse Api v0.0.21 Release: NotSet
2023/10/17 07:26:06 Listening on port 80

We are going to run the container with the _no_new_privs` bit set

podman run --rm -it --entrypoint /bin/bash --security-opt no-new-privileges --user 1024 -e APP_PORT=80 --name reversewords-test quay.io/mavazque/reversewords-captest:latest

File caps and thread caps remain the same as in the previous run, let's run the binary
```
/usr/bin/reverse-words
```
NOTE: This time the binary couldn't get the capability into the thread's effective set due to the no_new_privs bit.e
```
2023/10/17 07:26:19 Starting Reverse Api v0.0.21 Release: NotSet
2023/10/17 07:26:19 Listening on port 80
2023/10/17 07:26:19 listen tcp :80: bind: permission denied
```

If we run with root this time and with no_new_privs

podman run --rm -it --entrypoint /bin/bash --security-opt no-new-privileges --user 0 -e APP_PORT=80 --name reversewords-test quay.io/mavazque/reversewords-captest:latest

Get the container's thread capabilities

grep Cap /proc/1/status

CapInh: 0000000000000000
CapPrm: 00000000800405fb
CapEff: 00000000800405fb
CapBnd: 00000000800405fb
CapAmb: 0000000000000000

Decode thread effective capabilities

capsh --decode=00000000800405fb

0x00000000800405fb=cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_sys_chroot,cap_setfcap

This time we got the NET_BIND_SERVICE in the thread's effective set, which means we will be able to use it since we don't need to raise it
```
/usr/bin/reverse-words
```
```
2023/10/17 07:27:26 Starting Reverse Api v0.0.21 Release: NotSet
2023/10/17 07:27:26 Listening on port 80
```

Hands-on Demo 3

In this demo we're going to show how we can audit containers abusing setuid/sudo on an OpenShift cluster.

Even if file capabilities can cause a privilege escalation we have great tools today to avoid pods from getting such capabilities via SCCs, where the restricted-v2 SCC does a pretty good job limiting the number of CAPS available for pods by default. On the other hand, we have the "becoming root/executing as root" problem, which in this case new v2 SCCs introduced back in OCP 4.11 are helping to mitigate as v2 SCCs have the setting allowPrivilegeEscalation set to false.

In order to showcase a scenario where an application is abusing the setuid, kudos to r/linuxquestions for providing the testing app (quay.io/fherrman/my-ubi-setuid:0.4):

#include <stdlib.h>
#include <unistd.h>

int main () {
    setuid(0);
    execl("/bin/bash", "bash", "-p",  0);
}

As you can see the container image will have this binary with setuid configured under /usr/bin/bashwrap:

$ ls -l /usr/bin/bashwrap 
-rwsr-xr-x. 1 root root 17488 Nov 12 22:42 /usr/bin/bashwrap

Now that we introduced the app that we will be using we will see how a user can build a container image with an app like the one we have above in order to become root with an SCC that doesn't allow running as root, but does allow privilege escalation.

Create a namespace for our tests
```
oc create ns test-priv-esc
```

Deploy our application

Since v2 SCCs restrict the privilege escalation we are going to provide access to the old restricted SCC to showcase this issue to the default SA in this project
```
oc -n test-priv-esc adm policy add-scc-to-user restricted -z default
```

Deploy the application

cat <<EOF | oc -n test-priv-esc create -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    app: privescalation
  name: privescalation
spec:
  replicas: 1
  selector:
    matchLabels:
      app: privescalation
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: privescalation
    spec:
      containers:
      - image: quay.io/fherrman/my-ubi-setuid:0.4
        command: ["sleep","9999"]
        name: my-ubi-setuid
        resources: {}
        securityContext:
          allowPrivilegeEscalation: true
status: {}
EOF

If we check the SCC assigned to our pod we will see that we got the restricted SCC assigned to our workload:
```
oc -n test-priv-esc get pod -l app=privescalation -o yaml | grep scc
```
```
openshift.io/scc: restricted
```
The restricted SCC does not allow containers to run with a root uid (0), let's see how we configured our setuid binary and let's try to execute it:
1. Connect to the pod:
```
oc -n test-priv-esc rsh deployment/privescalation

sh-4.4$
```
2. Check app binary configuration:
```
ls -l /usr/bin/bashwrap

-rwsr-xr-x. 1 root root 17488 Nov 12 22:42 /usr/bin/bashwrap
```
3. Execute the app:
```
/usr/bin/bashwrap

bash-4.4#
```
4. Check our effective uid:
  
  NOTE: As you can see our effective uid is 0.
```
id

uid=1000680000(1000680000) gid=0(root) euid=0(root) groups=0(root),1000680000
```

This means that at this point the container process is running as root in the node. Let's verify it:

Check the node where the pod is running

OCP_NODE=$(oc -n test-priv-esc get pod -l app=privescalation -o jsonpath='{.items[*].spec.nodeName}')

Open a debug session into that node
```
oc debug node/${OCP_NODE}
chroot /host
```
Connect to the pod in a different terminal, execute our app and run a "sleep 288":
```
oc -n test-priv-esc rsh deployment/privescalation
/usr/bin/bashwrap
sleep 288
```

Back in the node shell, check the process owner:

ps -ef | grep "sleep 288" | grep -v grep

NOTE: As you can see we're running as root.

root     1352231 1352181  0 07:31 pts/0    00:00:00 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 288

At this point we have demonstrated how we can abuse setuid binaries, now let's see what we can do on the node to know when this happens.

We will use auditd rules to monitor when a process performs privilege escalation by changing from non-root uid to root uid. The audit rule we will use is the following one:

NOTE: Below rule only targets 64bits syscall, if you want to get the 32bits ones as well you need to edit the rule:
```
always,exit -F arch=b64 -S execve -C uid!=euid -F euid=0 -k setuid-abuse
```

In order to get this rule to our worker nodes we will create the following MachineConfig:

cat <<EOF | oc create -f -
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-auditd-setuid-rule
spec:
  config:
    ignition:
      version: 3.1.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf-8;base64,LWEgYWx3YXlzLGV4aXQgLUYgYXJjaD1iNjQgLVMgZXhlY3ZlIC1DIHVpZCE9ZXVpZCAtRiBldWlkPTAgLWsgc2V0dWlkLWFidXNlCg==
        mode: 420
        overwrite: true
        path: /etc/audit/rules.d/setuid-abuse.rules
EOF

After our nodes restart we will have the audit rules in place, if we run the same operation in our application pod, we will get something like this in our audit log:

NOTE: Below command should be run in the node (you can use oc debug node as we did before)

grep setuid-abuse /var/log/audit/audit.log

type=SYSCALL msg=audit(1637601395.568:124): arch=c000003e syscall=59 success=yes exit=0 a0=4006b0 a1=7ffee2255690 a2=7ffee2255818 a3=1 items=2 ppid=25611 pid=28357 auid=4294967295 uid=1000680000 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=4294967295 comm="bash" exe="/usr/bin/bash" subj=system_u:system_r:container_t:s0:c15,c26 key="setuid-abuse"ARCH=x86_64 SYSCALL=execve AUID="unset" UID="unknown(1000680000)" GID="root" EUID="root" SUID="root" FSUID="root" EGID="root" SGID="root" FSGID="root"

In the audit log we have different information that we can use to identify the container running the privilege escalation.
We could send this audit log to a SIEM system and create alerts based on this audit rules.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

AllowPrivilegeEscalation

Hands-on Demo 1

Hands-on Demo 2

Hands-on Demo 3

Files

README.md

Latest commit

History

README.md

File metadata and controls

AllowPrivilegeEscalation

Hands-on Demo 1

Hands-on Demo 2

Hands-on Demo 3