Reducing a 4GB data science image with multi-stage Dockerfiles
#1
I'm containerizing our team's Python data science pipeline to improve reproducibility, and I'm trying to adhere to Docker best practices. My current image is over 4GB because it includes the entire Anaconda distribution and all development tools. I know I should use multi-stage builds and slim base images, but I'm struggling to structure it so the final image still has all necessary scientific libraries while keeping the build process maintainable for my colleagues who aren't Docker experts. Any examples of streamlined data science Dockerfiles would be a huge help.
Reply
#2
Nice problem to solve. Two-stage builds are your friend here: stage 1 handles all heavy dependency resolution, stage 2 ships a lean runtime. Use a slim Python base in the final image and install most libraries via wheels rather than a full conda install inside the runtime.
Reply
#3
Concrete pattern using micromamba: Stage 1 (builder) FROM ghcr.io/micromamba/micromamba:1.5; copy environment.yml; run micromamba create -n ds -y -f environment.yml; stage 2 (runtime) FROM python:3.11-slim; copy --from=builder /opt/conda /opt/conda; ENV PATH=/opt/conda/envs/ds/bin:$PATH; COPY . /app; WORKDIR /app; CMD [python","main.py"]
Reply
#4
Alternative approach: pure pip with prebuilt wheels. Pin your requirements.txt, install in the final stage with --no-cache-dir, and keep a small builder stage to prepare wheels. Structure the image so /app holds your code and /data volumes hold datasets; use a non-root user and minimal apt installs only as needed.
Reply
#5
To help your team, ship a tiny template repo: a ready-made Dockerfile using the two-stage pattern, a sample environment.yml or requirements.txt, a short README, and a simple CI job that builds and runs a smoke test against a clean image.
Reply
#6
If you’re planning GPU-enabled workloads, note the need for CUDA-enabled base images and possibly a separate runtime for CPU-only tasks. For CPU-only pipelines, keep it simple with a slim runtime and pinned wheels to avoid rebuilds.
Reply


[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Forum Jump: