Packaging your own application¶
Workflow Overview¶
The simplest workflow for building a docker container with your own code usually follows these steps:
- Identify an appropriate image
- Identify additional dependencies needed for your application
- Install those dependencies with the appropriate RUN commands
- Add your code to the image, either with ADD or git
- Specify an appropriate
CMD
orENTRYPOINT
specification - Build your image, repeating 2-4 if needed until success
- Run a container of your image, test behavior
- Iterate, if needed
Preparing docker image for your code¶
Choosing a base image¶
The first step in creating a docker container is choosing an appropriate base image. In general, picking the most specific image that meets your requirements is desirable. For example, if you are packaging a python app, it is likely advantageous to choose a python base image with the appropriate python version rather than pulling an ubuntu base image and installing python using RUN commands.
Installing dependencies¶
Once a base image is chosen, any additional dependencies need to be installed. For debian based images, the apt package manager is used to manage additional packages. For Fedora based images, the yum package manager is used. Be sure to check which base linux image is used for a more specific image to know which package manager to use.
Annoyance Alert
In practice, it can be hard to know all of the additional system packages that need to be installed. Often, building a image to completion and running it to identify errors is the most expedient way to create an image.
Occasionally, a software package dependency, or a specific version of software, is not available in the software repositories for a base image linux distro. In these cases, it might be necessary to download and install precompiled binaries manually, or build a package from source. For example, here is an example Dockerfile that installs a specific version of samtools from a source release available on github:
FROM ubuntu:bionic
RUN apt update
# need these packages to download and build samtools:
# https://github.com/samtools/samtools/blob/1.9/INSTALL
RUN apt install -y wget gcc libz-dev ncurses-dev libbz2-dev liblzma-dev \
libcurl3-dev libcrypto++-dev make
RUN wget https://github.com/samtools/samtools/releases/download/1.9/samtools-1.9.tar.bz2 && \
tar jxf samtools-1.9.tar.bz2 && \
cd samtools-1.9 && ./configure && make install
CMD ["samtools"]
Putting your code into a docker image¶
Once your dependencies are installed, the final step is to move your own code into your image. There are primarily two different strategies for doing so:
- Copy source files into the image using the
ADD
command in the Dockerfile - Clone a git repository into the image from a publicly hosted repo like github or bitbucket
Nota Bene
In any case, it is a good idea to create a git or other source code versioning system to develop your code, hosted publicly if possible. Your Dockerfile should be developed and tracked along with your code, so that both can be developed over time while maintaining reproducibility.
Locally¶
The local strategy is convenient when developing software. Running development code in a docker container ensures your testing and debugging environment are consistent with the execution environment where your code will ultimately run. To build from a local source tree:
- Create a Dockerfile in the root directory where your code resides
- Prepare the Dockerfile for your code as in Preparing docker image for your code
- Copy all of the source files into a directory (e.g.
/app
) in the container withADD . /app
- Perform any setup that comes bundled with your package source (e.g.
pip install -r requirements.txt
orpython setup.py
) with theRUN
command - Set the
CMD
entry point appropriately for your app - Build your image with an appropriate tag
- Run and test your application, ideally with unit tests
Assuming we have written a python application named app.py
, from within the
source code directory containing the application we could write the following
Dockerfile:
# Use an official Python runtime as a parent image
FROM python:2.7-slim
# Copy the current (host) directory contents into the container at /app
ADD . /app
# Install any needed packages specified in requirements.txt
RUN pip install --trusted-host pypi.python.org -r requirements.txt
# mount the current working directory to /cwd when the container is run
WORKDIR /cwd
# Run app.py when the container launches
ENTRYPOINT ["python", "app.py"]
When a container is run, app.py
will be run directly and passed any
additional arguments specified to the docker run
command.
Cloning from github/bitbucket¶
For software projects hosted on github or bitbucket, or when it is not
desired to include a Dockerfile along with your application source code, the
Dockerfile can also be set to clone and install a git repo instead of adding
code locally. Instead of using the ADD
command from above,
use a RUN git clone <repo url>
instead:
FROM python:3.6
# have to install git to clone
RUN apt install git
# git clone repo instead of ADD
RUN git clone https://bitbucket.org/bubioinformaticshub/docker_test_app /app
RUN pip install --trusted-host pypi.python.org -r /app/requirements.txt
# mount the current working directory to /cwd when the container is run
WORKDIR /cwd
# use ENTRYPOINT so we can pass files on the command line
ENTRYPOINT ["python", "/app/app.py"]
Cloning a public repo into a Docker container in this way has the advantage that the environment where you write your code can be the same or different than the platform where the code is run.
There is one additional caveat to this method of adding code to your image.
To save on build time, docker caches the sequential steps in your Dockerfile
when building an image, and only reruns the steps from the command where a
change has been made. The ADD
command automatically detects if local file
changes have been made and automatically re-copies them into the container
on docker build. This method of cloning a repo from bitbucket, however, does
not re-trigger a build. When cloning your application from a public git
repo, the --no-cache
flag must be provided to your docker build command:
$ docker build --no-cache --tag app:latest .
This invalidates all build cache and re-clones your repo on each build.
Running your docker container¶
Once your code has been loaded into an image, containers for your image can
be run in the normal way with docker run
. Any host directories containing
files needed for the analysis must be mounted:
$ docker run --mount type=bind,source=/data,target=/data \
--mount type=bind,source=$PWD,target=/cwd app \
--in=/data/some_data.txt --out=/data/some_data_output.csv
Remember that any time your code changes you will need to rebuild your image,
including --no-cache
if you pull your code from a git repo.
Publishing your docker image¶
Once your docker image is complete and your app is read to share, you can
create a free account on Docker Hub and upload your image. Be sure to
provide a full description of what the image does, what software it contains,
and how to run it, specifying any directories the container expects to be
mounted to access data (e.g. /data
). You might alternatively consider
hosting your image on the Amazon Elastic Container Registry or
Google Cloud Container Registry. If your app will primarily be executed
in either AWS or GAE environments, it may be preferable to publish your image
to the corresponding registry.
Hands On Exercise¶
Writing the Dockerfile¶
Write, build, and run a Dockerfile that:
- Uses the
python:3.6
base image - Installs
git
withapt
- Clones the repo docker_test_app
- Installs the dependencies using the
requirements.txt
file in the repo - Configures the
ENTRYPOINT
to run the script in the repo withpython3
Running the Dockerfile with data from an S3 bucket¶
Nota Bene
When you run this app, you should specify the -t
flag to your
docker run
command.
Try running the container using docker run
with no arguments to see the
usage.
A fastq file that can be passed to this script has been made available on a
shared S3 bucket. You will download this file to your local instance using
the aws cli. First, you must run aws configure
and provide your access
keys. Specify us-east-1
as the region. The bucket address of the file is:
s3://buaws-training-shared/test_reads.fastq.gz
Download the file using the aws cli and pass it to the app using docker run.
You must mount the directory where you downloaded the fastq file using the
--mount
command line option as above.