Introduction to docker

docker

docker is an open source software project supported and provided for free by Docker Inc. The software is available for Mac OS, Windows, and Linux operating systems. From its initial open source announcement in 2013, docker is:

a LinuX Container (LXC) technology augmented with a a high level API providing a lightweight virtualization solution that runs Unix processes in isolation. It provides a way to automate software deployment in a secure and repeatable environment.

(emphasis added). docker containers are:

  • automated because every docker container contains all of its own configuration is run with the same executable interface, and thus can be started automatically without manual intervention
  • secure because each runs in its own environment isolated from the host and other containers
  • repeatable because the container behavior is guaranteed to be the same on any system that runs the docker software

These three properties make docker an excellent solution to the problems faced by scientists who wish to write reproducible analysis and applications.

docker concepts

There are four critical concepts needed to get started as a docker user:

images

A docker image is a description of a software environment and is configuration. The concept of an image is abstract, as images are not run directly. Instead, images are used to instantiate containers that are runnable. For those familiar with object oriented programming, an image is to a container as a class is to an object. As such, images are not executed.

docker images are usually created, or built, with a Dockerfile. Images are often created using other images as a base and adding more application-specific configuration and software. For example, a common base image contains a standard ubuntu installation upon which other software is installed. While it is possible to build an image interactively without writing a Dockerfile, this practice is highly discouraged due to its irreproducibility.

Images can either be stored locally or in a public or private Image Registry. In any case, in order to create a container based off of an image, the image must be resident in the local docker installation. When building an image locally, the image is automatically added to the local registry. When using an image published on a public registry like Docker Hub, the image is first pulled to the local installation and then used to create an image.

Most docker images have a version associated with them. This enables the image to change over time while maintaining backwards compatibility and reproducibility. The image version is specified at build time.

container

A container is an instance created by image. You can think of a container as a physical file that has all of the software described by the image bundled together in a form that can be run. Each container is created using a single image.

By default, containers lack the permissions to communicate with the world outside its immediate docker execution environment. When a container is run, the user can specify locations on the host system that are exposed to the docker container by binding files and directories explicitly. The container can only read and write data to locations it is given permission to access. Containers that run services, like web servers, can also be granted access to certain ports on the host system at run time to allow communication outside of the host. In general, a docker container can only be granted access to the resources available to the user running the container (e.g. a normal user without elevated privileges cannot bind to reserved ports 0-1024 on linux).

Dockerfiles

A Dockerfile is a text file that contains the instructions for building an image. It is the preferred method for building docker images, over creating them interactively.

Dockerfiles are organized into sections that specify different aspects of an image. The following is a simple Dockerfile from the docs:

# Use an official Python runtime as a parent image
# This implicitly looks for and pulls the docker image named 'python'
# annotated with version '2.7-slim' from Docker Hub (if it was not already
# pulled locally)
FROM python:2.7-slim

# Set the working directory to /app inside the container
# The /app directory is created implicitly inside the container
WORKDIR /app

# Copy the current (host) directory contents into the container at /app
ADD . /app

# Install any needed packages specified in requirements.txt
# The file requirements.txt was copied into /app during the ADD step above
RUN pip install --trusted-host pypi.python.org -r requirements.txt

# Make port 80 available to the world outside this container
# This implies that app.py runs a web server on port 80
EXPOSE 80

# Define environment variable $NAME
ENV NAME World

# Run app.py when the container launches
CMD ["python", "app.py"]

The commands in all capital letters at the beginning of the line are Dockerfile commands that perform different configuration operations on the image.

Image Registry

Image registries are servers that store and host docker images. The software to run a Docker Registry is freely available, but Docker Hub is by far the most popular public registry. Docker images for your own apps can be freely published to and listed on Docker Hub for others to pull and use. Other free registries exist, including Amazon Elastic Container Registry and Google Cloud Container Registry.

Exercise

Navigate to Docker Hub and locate the python repository. Explore the page until you find the Dockerfile for python version 3.7-stretch and view it. What parent image was used to build the python:3.7-stretch image?

Locate the parent image on Docker Hub and examine its Dockerfile. What parent image was used to build this image?

Continue looking up the parent images of each Dockerfile you find until you reach the root image. What is its name?