In Part 3 — cgroups, we added resource limits using cgroups. At this point, our container already has:
- Filesystem isolation (
chroot) - Network isolation (network namespaces)
- Resource control (cgroups)
But there is still a huge problem.
Since containerization is essentially isolating processes into their own little world, it seems that every service/container needs to come with its own full copy of the filesystem.
If we create 10 containers, we duplicate the same:
- binaries
- libraries
- system utilities
- application files
- configuration files
...10 times as well. This quickly becomes inefficient:
- Wasted disk space
- Slow container creation
- Duplicate data everywhere
Modern container runtimes like Docker solved this problem using layered filesystems.
Instead of copying an entire filesystem for every container, they divide the filesystem into multiple layers based on functionality.
These layers can then be shared between multiple containers.
Each container gets its own virtual filesystem that merges multiple layers together into what looks like a normal root filesystem.
This concept is called a Union Filesystem or Stacked Filesystem.
Let me explain.
Understanding OverlayFS
Their are multiple union filesystems out there, we chose OverlayFS.
OverlayFS combines multiple directories into one single virtual filesystem.
Some layers are read-only, while one layer is writable, both combined create a merged filesystem.
When a process accesses the merged filesystem:
- Reads search through the layers from top to bottom, meaning if we had the same file with different content on multiple layers, the upper-most layer takes priority.
- Writes always go into the writable layer, and one writable layer is all we need, Other layers are readonly, and they meant to be shared between different containers, because if these layers were writable, multiple containers could write into them simultaneously, which would eventually lead to data corruption.
This allows containers to share the same base filesystem while still keeping their own isolated changes.
The Layers
OverlayFS is composed of multiple layers, where each layer has a very specific responsibility.
base/ — The foundation layer
This is the read-only operating system layer.
It usually contains the minimal Linux filesystem required for the container to function, including binaries, shared libraries, filesystem hierarchy, and system utilities.
something like:
/bin
/lib
/usr
/etc
Since containers should not modify the original image contents, this layer remains immutable.
app/ — The application layer
This is another read-only layer placed on top of the base filesystem.
It contains everything specific to the containerized application itself.
For an nginx container for example, this layer could contain, nginx binary, its configuration file, working directory, other scripts ...etc:
/usr/sbin/nginx
/etc/nginx/
/docker-entrypoint.sh
/var/www/html
Separating the application from the operating system layer is one of the reasons container layers are reusable.
We can multiple readonly layers, we used only two here for demonstration purposes.
upper/ — The writable container layer
This is the only writable layer in the stack.
Any filesystem modification performed by the container is stored here:
- new files
- modified files
- deleted files
- runtime-generated data
Even if a file originally exists in base/ or app/, modifications never touch those layers directly.
Instead, OverlayFS performs a copy-up operation into upper/.
work/ — Internal OverlayFS workspace
This directory is required internally by the Linux kernel for OverlayFS operations.
rootfs/ — The merged filesystem view
This is the final merged view exposed to the container.
This is what the process actually sees.
Internally, it is simply a combination of all previous layers merged together.
Building a Layered Container
Step 1: Create Directory Structure
Let's start by creating the OverlayFS directory structure.
mkdir -p my_container/{base,app,upper,work,rootfs}
tree -L 1 my_container
Step 2: Populate the Base Layer
The base layer acts as the operating system foundation.
cd my_container/base
mkdir -p {bin,sbin,lib,lib64,usr,proc,dev,tmp,etc}
cd usr
ln -s ../bin .
ln -s ../sbin .
ln -s ../lib .
ln -s ../lib64 .
cd ..
Here we are creating a minimal Linux filesystem structure, same as what we did in the previous blogs.
Now let's copy some required binaries.
cp -v /bin/{rm,bash,ls,ps,echo,nc} bin/
cp -v /sbin/ip sbin/
and their required libraries
binaries="bin/bash bin/ls bin/ps bin/echo bin/nc sbin/ip"
for bin in $binaries; do
libs=$(ldd /$bin | grep -o '/lib[^ ]*')
for lib in $libs; do
cp -v "$lib" ".$lib" 2>/dev/null || true
done
done
cd ../..
At this point, the base layer contains a minimal Linux userspace.
Step 3: Populate the App Layer
This layer contains application-specific files.
For demonstration purposes, we will create a very small Bash application and a YAML file representing a config file.
cd my_container/app
mkdir -p opt/myapp
We will add a simple script and a yaml config file just for aestitics
cat > opt/myapp/app.sh << 'EOF'
#!/bin/bash
echo "Hello from the application layer!"
echo "Current time: $(date)"
EOF
chmod +x opt/myapp/app.sh
cat > opt/myapp/config.yml << 'EOF'
app:
name: MyContainerApp
version: 1.0
port: 8080
EOF
cd ../..
Verify the app layer:
tree my_container/app
my_container/app
└── opt
└── myapp
├── app.sh
└── config.yml
2 directories, 2 files
At this point:
base/contains the operating systemapp/contains application-specific files
Those are our read-only layers, and they will act as the lower layers.
upper/is the writable layer, and it will act as the upper layer (as the name suggests :) )
Step 4: Mount the Overlay Filesystem
Now we combine all layers into a single merged filesystem.
OverlayFS has special mount options where we specify:
- Read-only layers (lowerdir)
- Writable layer (upperdir)
- Internal working directory (workdir)
sudo mount -t overlay overlay -o \
lowerdir=my_container/app:my_container/base,\
upperdir=my_container/upper,\
workdir=my_container/work \
my_container/rootfs
In lowerdir, layers are specified right-to-left.
lowerdir=my_container/app:my_container/base means:
base is the bottom layer app is stacked above it
If the same file exists in multiple layers, the top-most layer wins.
Verify the new mount:
mount | grep my_container
overlay on /root/my_container/rootfs type overlay (rw,relatime,lowerdir=my_container/app:my_container/base,upperdir=my_container/upper,workdir=my_container/work ...)
This long mount command may look complicated at first glance, but it is actually really simple.
Verify the merged filesystem
ls my_container/rootfs
bin dev etc lib lib64 opt proc sbin tmp usr
ls my_container/rootfs/opt/myapp/
app.sh config.yml
The rootfs/ directory now contains:
- Everything from
base/ - Everything from
app/ - An initially empty writable layer (
upper/)
Understanding Copy-on-Write
Now comes one of the most important concepts in container filesystems.
What happens if we modify a file?
As we mentioned above, lower layers are read-only.
So the process cannot modify them directly.
Instead, OverlayFS uses a mechanism called Copy-on-Write (CoW).
The workflow goes as follows:
- A process modifies a file that belongs to one of the lower layers
- The file gets copied into the
upper/layer - The modification happens there
- The merged filesystem now exposes the modified version
- Since OverlayFS prioritizes upper layers during reads, the new modified file becomes the one visible
This is how multiple containers can safely share the same read-only layers.
Step 5: Test Copy-on-Write Behavior
Now let's test this behavior ourselves.
Create a new file inside the merged filesystem:
echo "hello overlayfs" > my_container/rootfs/tmp.txt
find my_container -iname "tmp.txt"
my_container/upper/tmp.txt
my_container/rootfs/tmp.txt
Interesting...
Even though we created the file inside rootfs/, the actual file was stored inside upper/.
Here, rootfs/ only represents the merged filesystem view.
Layer Priority Example
Suppose both lower layers contain the same file:
app/etc/myapp/config.yml
base/etc/myapp/config.yml
The version from app/ will appear inside the merged filesystem because it is stacked above base/.
File Deletion and Whiteouts
But what actually happens when we delete a file?
Deleting a file inside the merged filesystem does not actually remove it from the lower read-only layers, because they are, well... read-only :)
Instead, OverlayFS creates a special marker called a whiteout inside the upper/ layer to hide the original file.
This marker basically tells the filesystem: "Pretend this file does not exist anymore."
Let's investigate this behavior ourselves.
We will create a file in the app/ layer, remove it from the rootfs/ layer, and inspect what actually happens.
Unmount the filesystem first:
umount my_container/rootfs
Create the file:
touch my_container/app/to_delete.txt
Mount the filesystem again using the same mount command from above, and re-enter the container using this command .
Now from inside the container, delete to_delete.txt:
rm to_delete.txt
From another terminal:
sudo find . -iname "to_delete.txt" -exec ls -lh {} \;
-rw-r--r-- 1 admin admin 0 May 28 12:11 ./my_container/app/to_delete.txt
c--------- 2 root root 0, 0 May 28 12:12 ./my_container/upper/to_delete.txt
Interesting...
The original file inside the app/ layer still exists.
But OverlayFS created another object inside the writable upper/ layer.
Notice the first character in the permissions output:
c---------
The c means this is now a character device, not a regular file.
So, what kind of character device is this?
Let's inspect it using file command:
file ./my_container/upper/to_delete.txt
./my_container/upper/to_delete.txt: character special (0/0)
hemmm...
The file is a special file with 0,0 !!!.
Actually, this is a device file and the 0,0 are its identifiers, they are its Major number and Minor number
Together, they define what a device this file actually represents.
OverlayFS uses this special 0,0 device device as a marker to indicate:
"This file should not appear in the merged filesystem."
So even though the original file still physically exists in the lower layer, the whiteout entry hides it from the final merged view.
Tying Everything Together
That's enough experiements, now let's tie everything together and run our container with the new layered filesystem.
# Create network namespace
sudo ip netns add container_net
# Set up veth pair
sudo ip link add veth-host type veth peer name veth-container
sudo ip link set veth-container netns container_net
# Configure host side
sudo ip addr add 192.168.0.1/24 dev veth-host
sudo ip link set veth-host up
# Configure container side
sudo ip netns exec container_net ip link set lo up
sudo ip netns exec container_net ip addr add 192.168.0.2/24 dev veth-container
sudo ip netns exec container_net ip link set veth-container up
# Enter the container
sudo ip netns exec container_net chroot my_container/rootfs /bin/bash
With that, we've created a container with an overlay filesystem.
Until next time.