Docker basics | Overlay Filesystems

Overlays are a very important concept when it comes to Docker storage. This article explains what it is and how it works.

Everything in UNIX is a file (see http://www.it-automation.com/2021/06/17/what-everything-is-a-file-really-means-in-linux.html). Basically what that means that everything is stream of 0 and 1.

Let’s assume we have a the file which is represented by 00101010 and we want to make some changes to. Before we do those changes we want to save the original version so we have a chance to go back to it any time we want. The downside is that we now have doubled the amount of space we need to store our versions. Doing that 1000 times means we need 1000 times the amount of space.

Typically we don’t change the entire file but only a small portion of it.

# Version 0 
00101010
# Version 1
00110010
# Version 2
10110010
# Version 3 (working copy)
10010010

So in order to save precious disk space we will want to use a different approach. It does not make sense to store our original file if we only do slight changes to it. We only save those parts of the file that we have changed. By doing so we get the following patch files.

# Working copy
10010010
# Patch 2
--1-----
# Patch 1
0-------
# Patch 0
---01---

Now if we want to go back to version 2, we apply patch 2. If we want to go back to version 1 we apply patch 2 and patch 1 afterwards. And if we want to got back to verion 1 we apply patch 2, then patch 1 and then patch 0. This approach is very common when doing snapshots (like LVM or VMs).

Often times people are curious about why snapshots are initially very small and grow over time. This is because the snapshot doesn’t store the entire data but only the differences to our working copy. The more we change the more differences we get, the more disk space we need.

While we use snapshots do go back in time we can also flip the entire thing around and go forward using the same idea.

# Version 0
00101010
# Patch 1
---10---
# Patch 2
1-------
# Patch 3 (working copy)
--0-----

Each patch forms a separate layer. Stacking all layers on top of each other finally assembles our working copy. Watching this from top to bottom shows that our working copy is assembled from distinct bits that come from different layers. An important thing to notice is that some of the bits from lower layers are invisible to us as they are covered from the a layer above.

--0-----
1-------
---10---
00101010

Basically this is is the idea how docker images are assembled together, except that it works on a file level rather than single bits.

We may have the following layers.

file1, -----, -----, file 4
-----, -----, file3
file1, file2, file3 

Even though, file1 and file3 from the original layer are no longer visible to us they are still there. Deleting or changing a large amount of files is something that can largely grow image sizes. But once we understand the root causes we can think about ways to fix that. Features like Multi-Stage builds are useful to deal with that problem.

Contact