This is a rewrite of a talk at ngParty III.
There are many many articles explaining Docker so I will be brief here.
Docker is a tool to help you package your applications to containers. Think of container as virtual machine, but not quite: containers share kernel and are much faster. It is more like chroot on steroids.
Why do we use Docker? Our environment is reproducible, robust, and idempotent. “It works on my machine” is a solved problem. Setup and deploy is much easier. Our environment is code which means we version it in git and we can do code reviews.
If you want to read more, these are great resources:
What is Docker? By Docker, Inc.
A Not Very Short Introduction to Docker by Anders Janmyr
Getting Started with Docker for the Node.js Developer by Heitor Tashiro Sergent
Docker Overview by Docker, Inc.
Dockerizing a Node.js web app by Node.js Foundation
There are a few core concepts of Docker:
Container is a shell for your application. Docker uses a metaphor with physical containers: the kind you put on ships and move them around. The metaphor works very well. The shipping company does not need to bother how to handle particular packages and does not need equipment to move them. Instead you package stuff into a standardized container.
Same thing with Docker containers: they have defined behavior and API so that you do not really care what is inside it.
Containers also help you isolate environments in a way similar to virtual machines, but much faster and with smaller footprint. VM can start in seconds, or even minutes? Spinning up a container comes with delay of several milliseconds.
How does one run a container? Using Docker image.
Docker image is a snapshot, which works as container filesystem. Image on its own is static, does not run anything and does not have any intermediate state. It only allows creating containers.
Images come in layers. Each layer is an append-only diff. Images can extend another images: it just adds a new layers on top of the other one.
Images can share layers. If you have 10 images all based off base Node.js image, the common part is only stored once.
So, deleting a file does not make the image smaller! It is still there in the previous layer.
Docker is a command line tool. There are the most common commands you will use every day:
docker runruns new container from given image
docker builduses Dockerfile to build new image
docker pullhelp you interact with a registry
Dockerfile is a plain text file with instructions how to build an image. Each instruction creates new layer.
There are some important bits you should not miss:
Always specify base image version. The default is
:latest and using latest will break your builds eventually. We tried.
Docker images come in layers. Remember? Docker caches layers by default. Cache speeds up build, but it can bite you too.
How is cache invalidated? For almost all commands, changing the instruction discards cached layer. For
COPY command, Docker compares file hash and discards cache every time its content changed. This means that if you do a naive installation, Docker will install all packages over and over again. That can take 10 minutes or more. You do not want that.
Once you copy
package.json first, Docker caches all packages regardless of your source code changes. The cache invalidates once you change contents of package.json: which is what you want it to do anyway.
This practice is not optimal; some suggest using a volume, or a npm shrinkwrap file.
We do neither, on purpose.
We value that running a full rebuild from time to time does help us identify some issues we would have not found otherwise.
It makes some builds last longer, and build breaks more often. But it breaks sooner, and we are aware of the bug much earlier than we would have been otherwise. I believe that is worth the tradeoff.
Every built image comes with passing tests. This helps us make sure that we ship tested code. Every change in source code invalidates the layer and makes tests run again.
There are several pieces missing in this article which you may need:
- Entrypoint and / or CMD
- Configuring user other than root
- apt-getting more packages
Your node_modules folder probably contains native packages compiled for different platform and incompatible with Linux containers. You need the image to build its own.
As a bonus, it speeds up build a lot: Docker for Mac suffers from slow filesystem operations. Ignoring unused node_modules helps.
See the .dockerignore reference
We abandoned files and configure our applications using shell environment variables. With Docker toolset, it is easier to manage the environment than figure our which file belongs where.
All logging is best done to standard output:
process.stdout. The tooling expects that so do not bother with your application being fancy with log files.
Once you follow the advice above you will have a Node.js application running inside a container. But the app usually does not run alone; it requires another services. Do they run in containers? Of course!
Now running one container is simple. Running several of requires some care. This is where Docker Compose comes to rescue.
Docker compose is great tool that helps you manage orchestrating multiple containers together.
It is useful even when you have a single container: remembering all command line flags is daunting.
This is how a typical
docker-compose.yml looks like. It spins a web service providing HTML pages, and mysql service holding the data.
Let us go through some interesting pieces:
Docker will download existing images from registry automatically. You may also build your own when launching the stack.
Volumes allow persisting files from the container on host filesystem; here the mysql data.
Remember the environment variables? It comes in handy.
Do not include production database passwords in a compose file! Consider this an example.
Handling credentials is a talk on its own; we usually use tools like Vault, or get the passwords from somewhere completely different place. Usually, we encrypt the important pieces using Ansible Vault and generate docker compose files using Ansible.
You most certainly do not want to have your AWS credentials exposed.
Containers by default do not see each other. You may link them together using a private network created by Docker.
The great thing is that we usually hardcode hostnames in the app and configure the environment instead:
With Docker links, this comes for free. When running the application directly on hostname, one can edit their
/etc/hosts file for the same effect (this is what Docker does with links anyway).
There is a caveat though. Docker knows to boot containers in correct order, but it is not smart enough to wait until the process inside is ready. That may take seconds or minutes. Your application must count for this and retry failed connections to database before giving up.
By default, container ports are not exposed to the outside world. It is not usually necessary; you do not want your MySQL port to be accessible from anywhere else but the application.
Unless you actually want to expose some ports. Docker lets you map easily.
Thanks to built-in Docker restart policies, we no longer use any other process managers. We have used and abandoned init.d scripts, foreman.js and pm2. Docker can keep your containers alive.
This is an example of a bad commit, which would break the application in a subtle way. A developer deleted some dependencies by mistake. Tests are passing, application works on staging and production! All because the build script did not have
npm prune in it so that the packages were not actually removed. This would only surface as soon as new developer joins the team, or a new server is provisioned. Potentially few weeks or months after this commit.
Remember the full builds? This would have failed the build.
By the way: We found this bug during code review. Do you do code reviews? You should.
This is an example of WordPress README instructions. At first, there were no installation instructions. Bad, mind-twisting process of installing everything took several hours. Trial and error driven deployment.
On the left is a screenshot of README after we introduced Docker. It alone made the process predictable and faster: it only took two hours.
On the right is README instructions after we added a
docker-compose.yml file. Now, a new developer can get from zero to full productivity in five minutes.
Using Docker made our build and deployment more robust, reliable, deterministic, and faster.
On the other hand, local development on OSX is still faster in some platforms. Node.js and webpack requires file watching during development; Docker used to have issues. I myself usually default to nvm.
Compiling and running Scala applications requires copying several GB of libraries. That is heartbreakingly slow on Mac’s xhyve.
But it is definitely possible. Check out the talk by David Blurton for some great insight:
In this article, I have just scratched the surface of what is possible with Docker. The Docker world is vast, is changing constantly and I believe in bright future of containers.
I found several great articles that pretty much match our experience:
- Docker in Production: A History of Failure by The HFT Guy
- Docker in Production: A retort by Sysadmin 4 lyfe
- Why Docker is Not Yet Succeeding Widely in Production by Simon Hørup Eskildsen_
- Docker Not Ready for Prime Time by David P. Pollak
- Thou shalt not run a database inside a container by Sysadmin 4 lyfe