Back to posts

Sat, Jul 13, 2024

Docker Development Environments

Software and data science packages move fast. By the time you have an environment you are comfortable with, new tooling and package updates are releasing. This makes it very difficult to manage dependencies as your development environment grows in complexity. It also leads to an ever increasing amount of clutter on your local system. The time to manage all this can become a significant cost and takes away from writing code.

Not to mention the frustration that stems from a deployment that fails despite it working locally. In addition sharing code becomes a very brittle process. There is no guarantee a team member will be able to reliably replicate your environment.

These headaches are far from the only ones, but they illustrate the value prop of containerized development environments. You keep your local environment simple and clean, you can version control your development environment, collaborative share is a breeze, and deployment to production is reliable.

When you finish this post you will have the tools you need to create a reusable development environment in a docker container. This will be a very simple template, but you can use it as a starting point to modify and extend to meet your particular needs for any project. Future posts will explore more complex environments and different approaches to the foundation laid out below.

Let’s meander together.

Setup and assumptions

If you prefer a bit of TL;DR approach you can find the code referenced below in the following Github repo. The code used in this tutorial is located in /python/simple.

$ git clone https://github.com/meanderio/docker-dev.git

Structure

To get started let’s create a new folder to work in and jump into it. In your terminal type the below commands.

$ mkdir simple
$ cd simple

From here let’s setup the folder structure and create some empty files.

$ mkdir app
$ touch Dockerfile compose.yaml 
$ touch app/requirements.txt app/app.py app/data.csv

Your working directory should now have the structure you see below. The important files will be Dockerfile and compose.yaml. The others create a very simple python project.

$ tree
.
├── Dockerfile
├── compose.yaml
└── app
    ├── app.py
    ├── data.csv
    └── requirements.txt

2 directories, 5 files

Now we need to add some text to each of these files. They are shown below so you can copy and paste their contents. The assumption is you have a working knowledge of docker concepts and can read a minimal amount of python code.

Dockerfile

# syntax=docker/dockerfile:1

FROM python:3.11-slim AS base

# prevents creation of .pyc files
ENV PYTHONDONTWRITEBYTECODE=1

# avoids stdout and stderr buffering
ENV PYTHONUNBUFFERED=1

# sets the working directory
WORKDIR /app

# copy and install python dependencies
COPY /app/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# copy project files
COPY ./app/* .

# add working dir to python path
ENV PYTHONPATH="${PYTHONPATH}:/app"

# use bash as interactive entrypoint
ENTRYPOINT [ "/bin/bash" ]

compose.yaml

services:
  app:
    container_name: simple-python
    build: .
    volumes:
      - type: bind
        source: ./app
        target: /app
    stdin_open: true
    tty: true
    entrypoint: /bin/bash

requirements.txt

pandas

app.py

import pandas as pd

df: pd.DataFrame  = pd.read_csv("data.csv")
dfg: pd.DataFrame = df.groupby("user_id").agg({
    "transaction_price": ["sum", "mean"], 
    "transaction_id": "nunique"
})
print(dfg)

data.csv

user_id,transaction_id,transaction_price
1234,2024-0001,2.95
1234,2024-0003,12.95
1234,2024-0077,23.05
3425,2024-0212,13.45
3425,2024-0212,17.99
9876,2024-0999,7.68
9876,2024-0999,9.98
9876,2024-0999,29.95

Deployment

The critical piece of code to take note of is in the compose.yaml file. Specifically the volumes section.

...
    volumes:
      - type: bind
        source: ./app
        target: /app
...

This creates a bind between the local subfolder ./app and the remote folder /app. As a result, any changes you make while working in the containerized development environment will be reflected in your local environment. For example, if you edit and save a change in app.py in the container, this change will be reflected in app.py on your local machine. This works in both directions.

Okay, time to deploy this setup and get to work. Run the below docker command to build your docker image and launch a running container of the image.

$ docker compose up --build -d

The --build flag instructs docker to build your image from the Dockerfile. The -d flag instructs docker to deploy this container in a detached state. This allows us to get our terminal prompt back to run other commands.

You can see your running container with the below command.

$ docker ps
CONTAINER ID   IMAGE       COMMAND       NAMES
eac0f45defae   simple-app  "/bin/bash"   simple-python

Remote development

What you have just accomplished is quite useful. Your remote development environment is running. However this next section will provide you with a way to attach to this remote environment in vscode. This is made possible with an extension called dev containers. Here’s how to connect to your container.

  1. Open a new vscode window.
  2. Click the button in the bottom lefthand corner. If you hover over it says Open a Remote Window.
  3. Choose Attach to Running Container…
  4. Select your container.

This will launch a new vscode window. You should see all the files we created in ./app. But remember, you are in a docker container. You are completely isolated from your local machine. Only the edits you make to the files will be reflected on your local machine due to the volume bind command. If you open a terminal and install a new python package this will not do so on your local machine.

Summary

We have learned how to setup and launch a docker container with a simple code base. We then learned how to connect to this container in a way that makes remote development simple. The best part, all the work we do in the remote environment is reflected back to our local machine. You can now stand up, tear down, and share your development environment with minimal friction.

This is a powerful and extensible paradigm. You should se this structure as a starting point. Build on it to meet your specific needs. Use it to reduce the time it takes to setup new projects. Let me know what you create.