In Part 3 — cgroups, we added resource limits using cgroups. At this point, our container already has:

Filesystem isolation (chroot)
Network isolation (network namespaces)
Resource control (cgroups)

But there is still a huge problem.

Since containerization is essentially isolating processes into their own little world, it seems that every service/container needs to come with its own full copy of the filesystem.

If we create 10 containers, we duplicate the same:

binaries
libraries
system utilities
application files
configuration files

...10 times as well. This quickly becomes inefficient:

Wasted disk space
Slow container creation
Duplicate data everywhere

Modern container runtimes like Docker solved this problem using layered filesystems.

Instead of copying an entire filesystem for every container, they divide the filesystem into multiple layers based on functionality.

These layers can then be shared between multiple containers.

Each container gets its own virtual filesystem that merges multiple layers together into what looks like a normal root filesystem.

This concept is called a Union Filesystem or Stacked Filesystem.

Let me explain.

Understanding OverlayFS

There are multiple union filesystems out there, we chose OverlayFS.

OverlayFS combines multiple directories into one single virtual filesystem.

Some layers are read-only, while one layer is writable, both combined create a merged filesystem.

When a process accesses the merged filesystem:

Reads search through the layers from top to bottom, meaning if we had the same file with different content on multiple layers, the upper-most layer takes priority.
Writes always go into the writable layer, and one writable layer is all we need. Other layers are read-only, and they are meant to be shared between different containers, because if these layers were writable, multiple containers could write into them simultaneously, which would eventually lead to data corruption.

This allows containers to share the same base filesystem while still keeping their own isolated changes.

The Layers

OverlayFS is composed of multiple layers, where each layer has a very specific responsibility.

`base/` — The foundation layer

This is the read-only operating system layer.

It usually contains the minimal Linux filesystem required for the container to function, including binaries, shared libraries, filesystem hierarchy, and system utilities.

Something like:

/bin
/lib
/usr
/etc

Since containers should not modify the original image contents, this layer remains immutable.

`app/` — The application layer

This is another read-only layer placed on top of the base filesystem.

It contains everything specific to the containerized application itself.

For an nginx container for example, this layer could contain the nginx binary, its configuration file, working directory, other scripts, and so on:

/usr/sbin/nginx
/etc/nginx/
/docker-entrypoint.sh
/var/www/html

Separating the application from the operating system layer is one of the reasons container layers are reusable.

We can have multiple read-only layers. We used only two here for demonstration purposes.

`upper/` — The writable container layer

This is the only writable layer in the stack.

Any filesystem modification performed by the container is stored here:

new files
modified files
deleted files
runtime-generated data

Even if a file originally exists in base/ or app/, modifications never touch those layers directly.

Instead, OverlayFS performs a copy-up operation into upper/.

`work/` — Internal OverlayFS workspace

This directory is required internally by the Linux kernel for OverlayFS operations.

`rootfs/` — The merged filesystem view

This is the final merged view exposed to the container.

This is what the process actually sees.

Internally, it is simply a combination of all previous layers merged together.

Building a Layered Container

Step 1: Create Directory Structure

Let's start by creating the OverlayFS directory structure.

mkdir -p my_container/{base,app,upper,work,rootfs}

tree -L 1 my_container

Step 2: Populate the Base Layer

The base layer acts as the operating system foundation.

cd my_container/base

mkdir -p {bin,sbin,lib,lib64,usr,proc,dev,tmp,etc}

cd usr
ln -s ../bin .
ln -s ../sbin .
ln -s ../lib .
ln -s ../lib64 .
cd ..

Here we are creating a minimal Linux filesystem structure, same as what we did in the previous blogs.

Now let's copy some required binaries.

cp -v /bin/{rm,bash,ls,ps,echo,nc} bin/
cp -v /sbin/ip sbin/

And their required libraries:

binaries="bin/bash bin/ls bin/ps bin/echo bin/nc sbin/ip"

for bin in $binaries; do
    libs=$(ldd /$bin | grep -o '/lib[^ ]*')

    for lib in $libs; do
        cp -v "$lib" ".$lib" 2>/dev/null || true
    done
done

cd ../..

The ldd output format varies between distributions. On Ubuntu, some libraries may live under /usr/lib/ rather than /lib/, so double-check the copied libraries match your environment if something fails to run inside the container.

At this point, the base layer contains a minimal Linux userspace.

Step 3: Populate the App Layer

This layer contains application-specific files.

For demonstration purposes, we will create a very small Bash application and a YAML file representing a config file.

cd my_container/app
mkdir -p opt/myapp

We will add a simple script and a YAML config file:

cat > opt/myapp/app.sh << 'EOF'
#!/bin/bash

echo "Hello from the application layer!"
echo "Current time: $(date)"
EOF

chmod +x opt/myapp/app.sh

cat > opt/myapp/config.yml << 'EOF'
app:
  name: MyContainerApp
  version: 1.0
  port: 8080
EOF

cd ../..

Verify the app layer:

tree my_container/app

my_container/app
└── opt
    └── myapp
        ├── app.sh
        └── config.yml

2 directories, 2 files

At this point:

base/ contains the operating system
app/ contains application-specific files

Those are our read-only layers, and they will act as the lower layers.

upper/ is the writable layer, and it will act as the upper layer (as the name suggests :) )

Step 4: Mount the Overlay Filesystem

Now we combine all layers into a single merged filesystem.

OverlayFS has special mount options where we specify:

Read-only layers (lowerdir)
Writable layer (upperdir)
Internal working directory (workdir)

sudo mount -t overlay overlay -o \
    lowerdir=my_container/app:my_container/base,\
    upperdir=my_container/upper,\
    workdir=my_container/work \
    my_container/rootfs

In lowerdir, layers are specified right-to-left.

lowerdir=my_container/app:my_container/base means:

base is the bottom layer
app is stacked above it

If the same file exists in multiple layers, the top-most layer wins.

Verify the new mount:

mount | grep my_container

overlay on /root/my_container/rootfs type overlay (rw,relatime,lowerdir=my_container/app:my_container/base,upperdir=my_container/upper,workdir=my_container/work ...)

Verify the merged filesystem:

ls my_container/rootfs

bin  dev  etc  lib  lib64  opt  proc  sbin  tmp  usr

ls my_container/rootfs/opt/myapp/

app.sh  config.yml

The rootfs/ directory now contains:

Everything from base/
Everything from app/
An initially empty writable layer (upper/)

Understanding Copy-on-Write

Now comes one of the most important concepts in container filesystems.

What happens if we modify a file?

As we mentioned above, lower layers are read-only.

So the process cannot modify them directly.

Instead, OverlayFS uses a mechanism called Copy-on-Write (CoW).

The workflow goes as follows:

A process modifies a file that belongs to one of the lower layers
The file gets copied into the upper/ layer
The modification happens there
The merged filesystem now exposes the modified version
Since OverlayFS prioritizes upper layers during reads, the modified version becomes the one visible in rootfs/

Note that the original file in the lower layer is untouched throughout this entire process — only the copy in upper/ changes.

This is how multiple containers can safely share the same read-only layers.

Step 5: Test Copy-on-Write Behavior

Now let's test this behavior ourselves.

Create a new file inside the merged filesystem:

echo "hello overlayfs" > my_container/rootfs/tmp.txt

find my_container -iname "tmp.txt"

my_container/upper/tmp.txt
my_container/rootfs/tmp.txt

Interesting.

Even though we created the file inside rootfs/, the actual file was stored inside upper/.

Here, rootfs/ only represents the merged filesystem view.

Layer Priority Example

Suppose both lower layers contain the same file:

app/etc/myapp/config.yml
base/etc/myapp/config.yml

The version from app/ will appear inside the merged filesystem because it is stacked above base/.

File Deletion and Whiteouts

But what actually happens when we delete a file?

Deleting a file inside the merged filesystem does not actually remove it from the lower read-only layers, because they are, well... read-only :)

Instead, OverlayFS creates a special marker called a whiteout inside the upper/ layer to hide the original file.

This marker basically tells the filesystem: "Pretend this file does not exist anymore."

Let's investigate this behavior ourselves.

We will create a file in the app/ layer, remove it from the rootfs/ layer, and inspect what actually happens.

Unmount the filesystem first:

umount my_container/rootfs

Create the file:

touch my_container/app/to_delete.txt

Mount the filesystem again using the same mount command from above, and re-enter the container.

Now from inside the container, delete to_delete.txt:

rm to_delete.txt

From another terminal:

sudo find . -iname "to_delete.txt" -exec ls -lh {} \;

-rw-r--r-- 1 admin admin 0 May 28 12:11 ./my_container/app/to_delete.txt
c--------- 2 root root 0, 0 May 28 12:12 ./my_container/upper/to_delete.txt

Interesting.

The original file inside the app/ layer still exists.

But OverlayFS created another object inside the writable upper/ layer.

Notice the first character in the permissions output:

c---------

The c means this is a character device, not a regular file.

So, what kind of character device is this?

Let's inspect it using the file command:

file ./my_container/upper/to_delete.txt

./my_container/upper/to_delete.txt: character special (0/0)

The file is a special character device with identifiers 0, 0 — its Major number and Minor number.

Together, they identify what device this file represents. The combination 0, 0 is not assigned to any real device, which is precisely the point: OverlayFS uses it as a special value to mark a deletion.

So even though the original file still physically exists in the lower layer, the whiteout entry hides it from the final merged view.

Tying Everything Together

That's enough experiments. Now let's tie everything together and run our container with the new layered filesystem.

# Create network namespace
sudo ip netns add container_net

# Set up veth pair
sudo ip link add veth-host type veth peer name veth-container
sudo ip link set veth-container netns container_net

# Configure host side
sudo ip addr add 192.168.0.1/24 dev veth-host
sudo ip link set veth-host up

# Configure container side
sudo ip netns exec container_net ip link set lo up
sudo ip netns exec container_net ip addr add 192.168.0.2/24 dev veth-container
sudo ip netns exec container_net ip link set veth-container up

# Enter the container
sudo ip netns exec container_net chroot my_container/rootfs /bin/bash

With that, we've built a container backed by a proper layered filesystem.

Until next time.

Building Containers from Scratch (Part 4): Union Filesystems with Overlay

Understanding OverlayFS

The Layers

base/ — The foundation layer

app/ — The application layer

upper/ — The writable container layer

work/ — Internal OverlayFS workspace

rootfs/ — The merged filesystem view

Building a Layered Container

Step 1: Create Directory Structure

Step 2: Populate the Base Layer

Step 3: Populate the App Layer

Step 4: Mount the Overlay Filesystem

Understanding Copy-on-Write

Step 5: Test Copy-on-Write Behavior

Layer Priority Example

File Deletion and Whiteouts

Tying Everything Together

`base/` — The foundation layer

`app/` — The application layer

`upper/` — The writable container layer

`work/` — Internal OverlayFS workspace

`rootfs/` — The merged filesystem view