-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
README: document shenanigans of linux capabilities
- Loading branch information
Showing
1 changed file
with
107 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,110 @@ | ||
# 🐊🪙 Crocochrome | ||
|
||
Crocochrome is a chromium supervisor. | ||
|
||
## On linux capabilities | ||
|
||
This project, by nature of using a supervisor _and_ chromium, interfaces strongly with linux capabilities inside kubernetes and/or docker. This is a complex topic and even the maintainer's knowledge of it is not perfect. We try to summarize our findings, assumptions, and the things we have verified below. | ||
|
||
The first relevant bit is that the crocochrome supervisor wants to run chromium as a different user. We do this to prevent chromium from reading certain files (while allowing crocochrome to do so), and to prevent the chromium process from interacting with crocochrome altogether, e.g. reading its environment, or sending signals to it. To run chromium as a different user, crocochrome uses Go's ability to specify `syscall.Credential` when launching a process, and it specifies the uid/gid of the nobody user. | ||
|
||
Normally, a process running as non-root wouldn't have the ability to do this. To allow crocochrome to do this, we add the `cap_setuid`, `cap_setgid` and `cap_kill` capabilities, with the first two allowing the binary to start another process as another user (any user), and the third allowing it to kill any other process. In order for this to be effective, we also need to add those three capabilities to the "bounding set". In kubernetes, this is done through the container's `securityContext`. These two different places where we specify capabilities (the binary and the container `securityContext` interact in the following ways: | ||
- If the capabilities are not specified anywhere, crocochrome will return an error while trying to run chromium as a different user: It has no permissions. | ||
- If the capabilities are set in the securityContext but not in the binary, the same thing happens. Capabilities added to the securityContext (bounding set) are not granted automatically to any binary inside the container. | ||
- If the capabilities are set in the binary, but not in the securityContext, the CRI will refuse to even start crocochrome, as it has capabilities that are disallowed in the bounding set. | ||
- If capabilities are set both in the binary and in the securityContext, crocochrome will start and will be able to start and kill processes that run as different users. | ||
|
||
A third variable is `allowPrivilegeEscalation`. As per the [k8s docs](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/), this enforces the `no_new_privs` flag. This flags is defined in the [linux docs](https://www.kernel.org/doc/Documentation/prctl/no_new_privs.txt), but the relevant part is the following one: | ||
> With no_new_privs set [...] file capabilities will not add to the permitted set [...] | ||
This means that setting `allowPrivilegeEscalation: false` will effectively result in the same scenario as if we never added the capabilities to our binary (scenario #2 in the list above). | ||
|
||
### On chromium in particular | ||
|
||
Being a web browser, chromium implements a series of security measures to try and isolate individual processes that run external code (javascript). This functionality is on by default, an can be disabled by launching chromium with `--no-sandbox`. | ||
|
||
On linux, chromium achieves this isolation by creating new PID and network namespaces to child processes using [clone with `CLONE_NEWPID | CLONE_NEWNET`](https://man7.org/linux/man-pages/man2/clone.2.html). Linux restricts the use of these flags to processes that have `CAP_SYS_ADMIN` (or, it is implied, run as `root`). | ||
|
||
It would seem that chromium tries multiple ways of doing this. If the `chromium` process is started as non-root, or without `CAP_SYS_ADMIN`, it will try to use a helper called `chrome-sandbox`, present in `/usr/lib/chromium/chrome-sandbox`. This helper has the setuid bit, so it will run as root regardless of the user who invokes this. | ||
|
||
We have verified that chromium will _not_ use this helper if it has `CAP_SYS_ADMIN`, by simply removing that file and trying to start chromium with the sandbox enabled: It runs without errors: | ||
|
||
``` | ||
# We have run rm /usr/lib/chromium/chrome-sandbox in this image. | ||
18:43:00 ~/Devel/crocochrome $> dr --cap-add=sys_admin localhost:5000/browser:latest | ||
[0529/164303.395670:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory | ||
[0529/164303.395966:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory | ||
[0529/164303.395993:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory | ||
Fontconfig error: No writable cache directories | ||
[0529/164303.403245:INFO:config_dir_policy_loader.cc(118)] Skipping mandatory platform policies because no policy file was found at: /etc/chromium/policies/managed | ||
[0529/164303.403264:INFO:config_dir_policy_loader.cc(118)] Skipping recommended platform policies because no policy file was found at: /etc/chromium/policies/recommended | ||
DevTools listening on ws://0.0.0.0:5222/devtools/browser/b6f2885e-1f1b-4119-8182-12eea1bae20f | ||
[0529/164303.405298:WARNING:bluez_dbus_manager.cc(248)] Floss manager not present, cannot set Floss enable/disable. | ||
[0529/164303.410270:WARNING:sandbox_linux.cc(420)] InitializeSandbox() called with multiple threads in process gpu-process. | ||
``` | ||
|
||
If we do not add that specific capability, chromium won't start: | ||
``` | ||
# We have run rm /usr/lib/chromium/chrome-sandbox in this image. | ||
18:47:58 ~/Devel/crocochrome $> dr --cap-add=all --cap-drop=sys_admin localhost:5000/browser:latest | ||
[0529/164805.302206:FATAL:zygote_host_impl_linux.cc(126)] No usable sandbox! Update your kernel or see https://chromium.googlesource.com/chromium/src/+/main/docs/linux/suid_sandbox_development.md for more information on developing with the SUID sandbox. If you want to live dangerously and need an immediate workaround, you can try using --no-sandbox. | ||
``` | ||
|
||
Interestingly, the latter scenario does **not** reproduce in k8s: Chromium is able to start with the sandbox enabled, wihtout the `sys_admin` capability, and with the `/usr/lib/chromium/chrome-sandbox` helper being removed from the image: | ||
|
||
Normally, chromium achieves this by having a helper binary, `chrome-sandbox`, with the setuid bit set: | ||
``` | ||
~ $ find / -type f -perm -4000 | ||
/usr/lib/chromium/chrome-sandbox | ||
~ $ ls -l /usr/lib/chromium/chrome-sandbox | ||
-rwsr-xr-x 1 root root 235048 May 15 06:08 /usr/lib/chromium/chrome-sandbox | ||
``` | ||
|
||
As it is customary, this means that despite which user execs `chrome-sandbox`, the resulting process will run as root. | ||
|
||
Another side effect of `allowPrivilegeEscalation: false` (and thus `no_new_privs`), is that: | ||
> [...] the setuid and setgid bits will no longer change the uid or gid | ||
This, however, does not seem to prevent chromium from running *without* `--no-sandbox`. Running on kubernetes with `allowPrivilegeEscalation: false` _and_ removing the setuid bit from everywhere in the image (`find / -type f -perm -4000 -exec chmod ugo-s {} +`) does not prevent chromium from running without `--no-sandbox`: | ||
|
||
``` | ||
[0529/173417.131639:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory | ||
[0529/173417.132032:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory | ||
[0529/173417.132056:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory | ||
[0529/173417.133214:ERROR:zygote_host_impl_linux.cc(273)] Failed to adjust OOM score of renderer with pid 29: Permission denied (13) | ||
Fontconfig error: No writable cache directories | ||
Fontconfig error: No writable cache directories | ||
[0529/173417.139017:INFO:config_dir_policy_loader.cc(118)] Skipping mandatory platform policies because no policy file was found at: /etc/chromium/policies/managed | ||
[0529/173417.139027:INFO:config_dir_policy_loader.cc(118)] Skipping recommended platform policies because no policy file was found at: /etc/chromium/policies/recommended | ||
DevTools listening on ws://0.0.0.0:5222/devtools/browser/7bf56a77-f480-415e-996d-b6e1d9861362 | ||
[0529/173417.140865:WARNING:bluez_dbus_manager.cc(248)] Floss manager not present, cannot set Floss enable/disable. | ||
[0529/173417.144703:WARNING:sandbox_linux.cc(420)] InitializeSandbox() called with multiple threads in process gpu-process. | ||
``` | ||
|
||
The logs above are produced by the following pod: | ||
|
||
```yaml | ||
apiVersion: v1 | ||
kind: Pod | ||
metadata: | ||
name: chromium | ||
labels: | ||
app.kubernetes.io/name: chromium | ||
spec: | ||
securityContext: | ||
runAsUser: 6666 | ||
runAsGroup: 6666 | ||
fsGroup: 6666 | ||
containers: | ||
- name: chromium-tip | ||
image: localhost:5000/browser # Same image as we ran with docker | ||
imagePullPolicy: Always | ||
securityContext: | ||
runAsNonRoot: true | ||
readOnlyRootFilesystem: true | ||
allowPrivilegeEscalation: false | ||
capabilities: | ||
drop: ["all"] | ||
``` |