Running docker

Nota Bene

You must be using a computer with docker installed to complete the exercises on this page. If you are attending the BU workshop, refer to the page on connecting to your EC2 instance for instructions on how to SSH into your instance.

Your First Docker Container

Containers are run using the command:

$ docker run <image name>[:<tag>]

The <image name> must be a recognized docker image name either on the local machine or on Docker Hub. The optional :<tag> specifies a particular version of the image to run.

Exercise

Run a container for the hello-world docker image hosted on Docker Hub.

If you need help, try running docker and docker run without any arguments to see usage information.

Read the text output by the container after it has been run.

Pulling docker images

As part of running a container from a public docker image, the image itself is pulled and stored locally. This only occurs once for each version of an image; subsequently run containers will use the local copy of the image.

If you have never run any docker containers in this environment before, there should be no local images listed by the docker images command:

$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
$

To verify that the hello-world image has been pulled, we again use the docker images command after running the container:

$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
hello-world         latest              2cb0d9787c4d        2 weeks ago         1.85kB
$

This output tells us that we have the latest version of the hello-world image in our local registry.

We can pull images explicitly, rather than doing so implicitly with a docker run call, using the docker pull command:

$ docker pull nginx

This may be useful if we do not want to run a container immediately, or want to perform our own modifications to the image locally prior to running.

Exercise

Pull the nginx image using the docker pull command. Verify that the latest image of nginx has been pulled using docker images.

Managing docker containers

Running detached containers

The hello-world container runs, prints its message, and then exits. If we were running a docker container that provided a service, we would want the container to persist running until we chose to shut it down. An example of this is the nginx web server, which we can run with the command:

$ docker run -d -p 8080:80 nginx

Here, the -d flag tells docker to keep the container running and return control to the command line when it is finished setting up the container. The -p 8080:80 means forward port 80, the default port for HTTP traffic, on the container to the unrestricted port 8080 on the local machine. When control has returned to the command line, we can verify that the container is still running using the docker ps command:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                  NAMES
49af27e82231        nginx               "nginx -g 'daemon of…"   4 minutes ago       Up 4 minutes        0.0.0.0:8080->80/tcp   elastic_mcnulty
$

Exercise

Run an nginx container as above. Verify that the container is running with docker ps.

If specified correctly, the local port 8080 should behave as if it is a web server. Verify that this is the case by running:

$ curl localhost:8080

Attaching data volumes to containers

Scientific analyses almost always utilize some form of data. Docker containers are intended to execute code, and are not designed to house data. Directories and data volumes that exist on the host machine can be mounted in the container at run time to enable the container to read and write data to the host:

$ docker run -d -p 8080:80 --mount type=bind,source="$PWD"/data,target=/ nginx

The directory named data in the current host directory will be mounted as /data in the root directory of the container.

Stopping running containers

When a docker container has been run in a detached state, it runs until it is stopped or encounters an error. To stop a running container, we need either the CONTAINER ID or NAMES attribute of the running container from docker ps:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                  NAMES
49af27e82231        nginx               "nginx -g 'daemon of…"   4 minutes ago       Up 4 minutes        0.0.0.0:8080->80/tcp   elastic_mcnulty
$ docker stop 49af72e82231 # could also have provided elastic_mcnulty
49af27e82231
$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
$

Stopping a container sends signals to the container that it should start shutting down, so once a container is stopped it usually cannot be started again.

Nota Bene

Docker maintains a record of all containers that have been run on a machine. After they have been stopped, docker ps does not show them, but the containers still exists. To see a list of all containers that have been run, use docker ps -a.

It is good practice to remove old containers if they are no longer needed. You can do this with the command docker container prune.

Creating docker images

Building a custom image

Chances are there is not an existing docker container that does exactly what you want (but check first!). To create your own image, you must write a Dockerfile. As an example, we will create an image that has the python package scipy_ installed for us to use. It is common convention to create a new directory named for the the image you wish to create, and create a text file named Dockerfile in it. In the scipy directory, our Dockerfile contains:

# pull a current version of python3
FROM python:3.6

# install scipy with pip
RUN pip install scipy

# when the container is run, put us directly into a python3 interpreter
CMD ["python3"]

To build this docker images, we use the docker build command from within the scipy directory containing the Dockerfile:

$ docker build --tag scipy:latest .
Sending build context to Docker daemon  2.048kB
Step 1/3 : FROM python:3.6
 ---> 638817465c7d
Step 2/3 : RUN pip install scipy
 ---> Running in 1eef65d3b6fd
Collecting scipy
  Downloading https://files.pythonhosted.org/...
Collecting numpy>=1.8.2 (from scipy)
  Downloading https://files.pythonhosted.org/...
Installing collected packages: numpy, scipy
Successfully installed numpy-1.15.0 scipy-1.1.0
Removing intermediate container 1eef65d3b6fd
 ---> 7f34e9147bef
Step 3/3 : CMD ["python3"]
 ---> Running in 5c9d778426e6
Removing intermediate container 5c9d778426e6
 ---> e27603f4ffaf
Successfully built e27603f4ffaf
Successfully tagged scipy:latest
$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED              SIZE
scipy               latest              e27603f4ffaf        About a minute ago   1.15GB
python              3.6                 638817465c7d        25 hours ago         922MB
$

The --tag scipy:latest argument gives our image a name when it is listed in docker images. Notice also that the python:3.6 image has been pulled in the process of building the scipy image.

Now that we have built our image, we can run and connect to the image using docker run with two additional flags:

$ docker run -i -t scipy
Python 3.6.0 (default, Jul 17 2018, 11:04:33)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import scipy
>>>

The -i flag tells docker we want to use the container interactively, and the -t flag connects our current terminal to the container so that we may send and receive information to and from the terminal.

Exercise

Create a new Dockerfile where you will install the most recent version of R. Use ubuntu:bionic as the base image. You may follow these instructions, without using the sudo command.

Hint: Use a different RUN line for each command.

Solution

Passing containers CLI arguments

The CMD Dockerfile command specifies a standalone executable to run when a container starts. However, sometimes it is convenient to be able to pass command line arguments to a container, for example to run an analysis pipeline on different files, or files with filenames that are not known at build time. For instance, if you we might want to run the following:

$ docker run python process_fastq.py some_reads.fastq.gz

The CMD command does not allow command line arguments to be passed to the run command. Instead, the ENTRYPOINT command is used to prefix a set of commands to any command line arguments passed to docker:

FROM python:3.6

# we will mount the current working directory to /cwd when the container is run
WORKDIR /cwd

RUN pip install pysam

# ENTRYPOINT instead of CMD
ENTRYPOINT ["python3"]

Any command line arguments passed to docker will be appended to the command(s) specified in the ENTRYPOINT.

If a container is intended to run files that exist on the host, the docker run command must also be supplied with a mount point so the container can access the files. In the example above, the WORKDIR is specified as /cwd, so we can bind the current working directory of the host to /cwd in the container so it can access the files process_fastq.py and some_reads.fastq in the current directory:

$ docker run -mount type=bind,source=$PWD,target=/cwd process_fastq.py some_reads.fastq