Multi-stage builds

This is an example application to demonstrate one possible use case for a multi-stage docker build. An example that, we hope, is a reasonable candidate for a containerized workflow has been selected. The example used is a statically generated website, with an added search feature.

Assuming that we want to eventually serve the site, the example would include at least the following steps:

Generate a search index
Build the site
Serve the site

Usage

Quickstart usage guide for this repository

Generate the certificates:

make generate_certs

Host the site locally. Note the up target is simply a wrapper for the docker-compose up command. This will build and run an nginx container as a foreground process. The logs are redirected to stdout and stderr.

(sudo) make up

To stop the container,

(sudo) make down

To remove the image

(sudo) make rmi

Generate the search index

The blog consists of webpages that can be served statically. The content of these pages is described in plain text files in the markdown format. This makes it relatively easy to generate a search index. Basically we want to recursively walk the content directory tree, read any files and use the front matter, content and metadata of each file to populate our index.

blog/content/
├── _index.md
├── page
│   └── about.md
├── post
│   ├── 2022-03-01-multi-stage-builds.md
│   ├── 2022-11-30-k8s-applications.md
│   ├── 2023-01-16-containers.md
│   ├── 2023-01-19-kubernetes.md
│   ├── 2023-02-03-docker-compose.md
│   └── 2023-02-20-gitflow.md
└── search
    └── _index.md

4 directories, 9 files

The files will be parsed into an array of json objects with the following atrributes. Note that the tags field is just included for sake of completeness. We're not using tags in this example, so it will be empty and excluded from our final index.

{"title":"...",
 "tags":"...",
 "href":"...",
 "content":"..."}

The following github gist provided an excellent basis for what we want to do here. It required only a few tiny modifications to the gruntfile to generate the index. The gruntfile is a node js script that contains a task with the specific function of creating the index.

The generation of the index is encapsulated in the first stage in our Dockerfile:

# Generate the index
FROM node:16-alpine AS indexer
WORKDIR /opt/blog-search
COPY . .
RUN npm install -g grunt-cli \
 && npm install \
 && grunt lunr-index

The image used while creating the image is the node:16-alpine image. A "small" version of the node LTS image. The size of this image is 996MB. While we need to spin up a container using this image in order to run the program that generates our index, the index itself is all we need in production. As outlined above, the site is purely static, so all content being served is known ahead of time. The actual size of the index of this content is ~1M.

Multi-stage builds give us the option of simply copying exactly this sliver of data out of the image once it has been built. Thus reducing the space requirements by a factor of 1000.

Build the site

We want to include the index that we just built in the public folder of the site that we are building. Docker's COPY directive allows us to do this:

# Build the site
FROM klakegg/hugo:0.101.0-busybox AS builder
WORKDIR /opt/blog-search
COPY --from=indexer /opt/blog-search/ .
WORKDIR /opt/blog-search/blog
RUN hugo

The hugo docker image comes in at a reasonably small 53MB. Again though, the actual content that we are creating is around ~1M. We can simply copy this sliver of data in the next step. In order to be able to do this effectively, we need to clearly label each stage of the multi-stage Dockerfile.

Serve the site

Let's assume that we want to serve the site in a production-ready container. Typically we would use a production grade server like nginx or apache. Full disclosure: I'm still making the transition from someone who can scrap together a webpage with html/css/js and someone who actually understands how the thing gets served over the internet. The type of setup you see specified The nginx config used in this example was tweaked from the example shown here by Edwin Lyon.

Anyway, we're just using the example to illustrate a point about layering during the docker build process. The nginx:1.13.3-alpine image weighs in at a feathery 40.7MB.

FROM nginx:1.23.3-alpine AS server
COPY --from=builder /opt/blog-search/blog/public/ /var/www/html/public/
EXPOSE 443/tcp
CMD ["/usr/sbin/nginx", "-g", "daemon off;"]

Space savings

If we included all three layers in a single docker file:

image	size (MB)
Node LTS:16-alpine	117.0
PagesIndex.json	000.026
hugo:0.101.0-busybox	053.0
public/	008.9
nginx:1.23.0-alpine	040.7
----------	-------
Total	219.63

A layered approach produces an image that is less than a quarter in size:

image	size (MB)
PagesIndex.json	000.026
public/	008.9
nginx:1.23.0-alpine	040.7
----------	-------
Total	049.626

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
blog		blog
nginxconfig		nginxconfig
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
Dockerfile1		Dockerfile1
Gruntfile.js		Gruntfile.js
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-stage builds

Usage

Generate the search index

Build the site

Serve the site

Space savings

About

Releases

Packages 3

Contributors 2

Languages

License

AustrianDataLAB/multi-stage-build

Folders and files

Latest commit

History

Repository files navigation

Multi-stage builds

Usage

Generate the search index

Build the site

Serve the site

Space savings

About

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 3

Contributors 2

Languages