Docker Development Environments
Software and data science packages move fast. By the time you have an environment you are comfortable with, new tooling and package updates are releasing. This makes it very difficult to manage dependencies as your development environment grows in complexity. It also leads to an ever increasing amount of clutter on your local system. The time to manage all this can become a significant cost and takes away from writing code.
Not to mention the frustration that stems from a deployment that fails despite it working locally. In addition sharing code becomes a very brittle process. There is no guarantee a team member will be able to reliably replicate your environment.
These headaches are far from the only ones, but they illustrate the value prop of containerized development environments. You keep your local environment simple and clean, you can version control your development environment, collaborative share is a breeze, and deployment to production is reliable.
When you finish this post you will have the tools you need to create a reusable development environment in a docker container. This will be a very simple template, but you can use it as a starting point to modify and extend to meet your particular needs for any project. Future posts will explore more complex environments and different approaches to the foundation laid out below.
Let’s meander together.
Setup and assumptions
- Install Docker Desktop or Docker Engine
- Install vscode and the Dev Containers extension
- You have a basic understanding of docker concepts
If you prefer a bit of TL;DR approach you can find the
code referenced below in the following Github repo.
The code used in this tutorial is located in /python/simple
.
$ git clone https://github.com/meanderio/docker-dev.git
Structure
To get started let’s create a new folder to work in and jump into it. In your terminal type the below commands.
$ mkdir simple
$ cd simple
From here let’s setup the folder structure and create some empty files.
$ mkdir app
$ touch Dockerfile compose.yaml
$ touch app/requirements.txt app/app.py app/data.csv
Your working directory should now have the structure you see below.
The important files will be Dockerfile
and compose.yaml
.
The others create a very simple python project.
$ tree
.
├── Dockerfile
├── compose.yaml
└── app
├── app.py
├── data.csv
└── requirements.txt
2 directories, 5 files
Now we need to add some text to each of these files. They are shown below so you can copy and paste their contents. The assumption is you have a working knowledge of docker concepts and can read a minimal amount of python code.
Dockerfile
# syntax=docker/dockerfile:1
FROM python:3.11-slim AS base
# prevents creation of .pyc files
ENV PYTHONDONTWRITEBYTECODE=1
# avoids stdout and stderr buffering
ENV PYTHONUNBUFFERED=1
# sets the working directory
WORKDIR /app
# copy and install python dependencies
COPY /app/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# copy project files
COPY ./app/* .
# add working dir to python path
ENV PYTHONPATH="${PYTHONPATH}:/app"
# use bash as interactive entrypoint
ENTRYPOINT [ "/bin/bash" ]
compose.yaml
services:
app:
container_name: simple-python
build: .
volumes:
- type: bind
source: ./app
target: /app
stdin_open: true
tty: true
entrypoint: /bin/bash
requirements.txt
pandas
app.py
import pandas as pd
df: pd.DataFrame = pd.read_csv("data.csv")
dfg: pd.DataFrame = df.groupby("user_id").agg({
"transaction_price": ["sum", "mean"],
"transaction_id": "nunique"
})
print(dfg)
data.csv
user_id,transaction_id,transaction_price
1234,2024-0001,2.95
1234,2024-0003,12.95
1234,2024-0077,23.05
3425,2024-0212,13.45
3425,2024-0212,17.99
9876,2024-0999,7.68
9876,2024-0999,9.98
9876,2024-0999,29.95
Deployment
The critical piece of code to take note of is in the compose.yaml
file.
Specifically the volumes section.
...
volumes:
- type: bind
source: ./app
target: /app
...
This creates a bind between the local subfolder ./app
and the remote folder /app
.
As a result, any changes you make while working in the
containerized development environment will be reflected
in your local environment.
For example, if you edit and save a change in app.py
in the container,
this change will be reflected in app.py
on your local machine.
This works in both directions.
Okay, time to deploy this setup and get to work. Run the below docker command to build your docker image and launch a running container of the image.
$ docker compose up --build -d
The --build
flag instructs docker to build your image from the Dockerfile
.
The -d
flag instructs docker to deploy this container in a detached state.
This allows us to get our terminal prompt back to run other commands.
You can see your running container with the below command.
$ docker ps
CONTAINER ID IMAGE COMMAND NAMES
eac0f45defae simple-app "/bin/bash" simple-python
Remote development
What you have just accomplished is quite useful. Your remote development environment is running. However this next section will provide you with a way to attach to this remote environment in vscode. This is made possible with an extension called dev containers. Here’s how to connect to your container.
- Open a new vscode window.
- Click the button in the bottom lefthand corner. If you hover over it says Open a Remote Window.
- Choose Attach to Running Container…
- Select your container.
This will launch a new vscode window.
You should see all the files we created in ./app
.
But remember, you are in a docker container.
You are completely isolated from your local machine.
Only the edits you make to the files will be reflected
on your local machine due to the volume bind command.
If you open a terminal and install a new python package
this will not do so on your local machine.
Summary
We have learned how to setup and launch a docker container with a simple code base. We then learned how to connect to this container in a way that makes remote development simple. The best part, all the work we do in the remote environment is reflected back to our local machine. You can now stand up, tear down, and share your development environment with minimal friction.
This is a powerful and extensible paradigm. You should se this structure as a starting point. Build on it to meet your specific needs. Use it to reduce the time it takes to setup new projects. Let me know what you create.