A docker blog post indicates,
Docker containers are, by default, quite secure; especially if you take care of running your processes inside the containers as non-privileged users (i.e. non root).”
When you run as root, you can access a broader range of kernel services. For instance, you can:
- read/write/delete/modify system files, resources
- snoop at what your programs are doing internally
- manipulate network interfaces, routing tables, netfilter rules;
- mount/unmount/remount filesystems;
- shutdown/remove machine
- change file ownership, permissions, extended attributes, overriding regular permissions;
- do a lot;
Main point here is that as root, you can exercise more kernel code; if there is a vulnerability in that code, you can trigger it as root, but not as a regular user. Additionally, if someone finds a way to break out of a container, regardless of who you were inside the container, you would break out as who the LXC process itself is running as on the host OS. From the official repo, the docker daemon binds to a Unix socket instead of a TCP port. By default that Unix socket is owned by the user root and other users can only access it using sudo. Saying that the docker daemon always runs as the root user. This means if you break out from container you are breaking out as root user.
Well of-course docker from the application perspective which is running inside docker, even if you run as a root, docker or essentially LXC containers (behind the scenes, docker uses
lxc-start to execute the Docker container) trying to address this and other permission related concerns using Kernel Namespaces. Anyone familiar with
chroot already has a basic idea of what Linux namespaces can do and how to use namespace generally.
Namespaces provide the first, and most straightforward, form of isolation. Because of Linux namespaces, it became possible to have multiple “nested” process trees. Each process tree can have an entirely isolated set of processes. This can ensure that processes belonging to one process tree cannot inspect or kill – in fact cannot even know of the existence of – processes in other sibling or parent process trees.
Linux namespaces allow other aspects of the operating system to be independently modified as well. This includes the process tree, networking interfaces, mount points, inter-process communication resources and more.
Still security basic rule, avoid granting unnecessary permission. So first thing first, avoid user granting root access for docker container and always run docker containers with the -u flag so that they run as an ordinary user. And for that ordinary user grant only required permission. For example if your dockerfile contains something like this lines,
echo "tomcat ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers