DevOps

Docker Multi-Stage Builds: From 1.2GB to 180MB in Production

All Posts FinOps DevOps Cybersecurity Product Updates
Share

Most Dockerfiles in production are doing it wrong. They copy everything, install everything, and ship everything — including build tools, devDependencies, and source code that the running application never needs.

The result: images that are 800MB–1.2GB when they could be 150–200MB. That difference compounds across every pull, every deploy, every node in your cluster.

The Problem with Standard Dockerfiles

Here's a Dockerfile that works but costs you time and money on every deployment:

FROM node:20
WORKDIR /app
COPY . .
RUN npm install
RUN npm run build
EXPOSE 5000
CMD ["npm", "start"]

This image includes:

Total: ~900MB. But the actual runtime needs are just the compiled JavaScript, production dependencies, and a minimal Node.js binary — roughly 180MB.

Worse: Docker layer caching breaks on every code change. Because COPY . . comes before npm install, any file change — even a one-line fix — invalidates the dependency cache and forces a full reinstall. That's 30–90 seconds wasted on every build.

How Multi-Stage Builds Work

Docker multi-stage builds solve both problems with one concept: separate build-time concerns from runtime needs.

Each FROM instruction starts a new stage. You can copy artifacts from earlier stages into later ones, discarding everything else. The final image only contains what the last stage includes.

The simplest optimization — fixing the layer cache:

# Stage 1: Install dependencies (cached unless package.json changes)
FROM node:20-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --prefer-offline --no-audit

# Stage 2: Build application
FROM node:20-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN npm run build

# Stage 3: Production image
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY package*.json ./
RUN npm ci --omit=dev --prefer-offline --no-audit
EXPOSE 5000
CMD ["node", "dist/index.js"]

Key improvements:

The 4-Stage Production Pattern

For production workloads, we use a 4-stage pattern that adds dependency isolation and security hardening:

# Stage 1: Dependencies (cached layer)
FROM node:20-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --prefer-offline --no-audit

# Stage 2: Build
FROM node:20-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY package*.json ./
COPY . .
RUN npm run build

# Stage 3: Production dependencies only
FROM node:20-alpine AS prod-deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev --prefer-offline --no-audit

# Stage 4: Final production image
FROM node:20-alpine
WORKDIR /app

RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001

COPY --from=prod-deps --chown=nodejs:nodejs /app/node_modules ./node_modules
COPY --chown=nodejs:nodejs package*.json ./
COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist

USER nodejs
EXPOSE 5000

HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
  CMD node -e "require('http').get('http://localhost:5000/api/health', (r) => {process.exit(r.statusCode === 200 ? 0 : 1)})"

CMD ["node", "dist/index.js"]

Why 4 stages instead of 3:

Security: Non-Root by Default

Notice the adduser and USER nodejs directives. The application runs as UID 1001, not root.

This matters because:

The --chown=nodejs:nodejs flag on COPY ensures all files are owned by the non-root user, preventing permission issues at runtime.

Special Cases: Native Modules

Some npm packages (like grpc, sharp, or bcrypt) require native compilation. Alpine doesn't include build tools by default, so you need to handle this:

# Build stage: install native build tools
FROM node:20-alpine AS builder
RUN apk add --no-cache python3 make g++
COPY package*.json ./
RUN npm ci --prefer-offline --no-audit
COPY tsconfig.json ./
COPY src ./src
RUN npm run build

# Production stage: install native deps, then clean up
FROM node:20-alpine
RUN apk add --no-cache libstdc++
COPY package*.json ./
RUN apk add --no-cache --virtual .build-deps python3 make g++ && \
    npm ci --omit=dev --prefer-offline --no-audit && \
    npm cache clean --force && \
    apk del .build-deps
COPY --from=builder /app/dist ./dist

The key pattern: install build tools as virtual packages (--virtual .build-deps), compile native modules, then delete the build tools (apk del .build-deps) in the same RUN layer. This keeps the final image small while still producing working native binaries.

The Cloud Cost Impact

Image size isn't vanity — it directly affects your cloud bill:

For organizations running dozens of microservices across multiple environments, optimized Dockerfiles can reduce container-related costs by 15–25%. When combined with proper Kubernetes cost allocation, the savings compound further because you can right-size pods based on actual requirements rather than bloated image overhead.

Track your container costs at the namespace level

CLARITY breaks down Kubernetes costs by cluster, namespace, and deployment across EKS, AKS, and GKE.

Start Free Trial

Production Checklist

Before shipping any Dockerfile to production, verify these 8 points:

  1. Alpine base image — use node:20-alpine, python:3.12-slim, or distroless. Never use the full OS image in production
  2. Package files copied firstCOPY package*.json ./ before COPY . . to preserve layer caching
  3. npm ci, not npm installci uses the lockfile exactly, producing deterministic builds. install can modify the lockfile
  4. Production-only dependencies--omit=dev in the final stage. No TypeScript compiler in production
  5. Non-root user — create a dedicated user and switch to it with USER before the CMD
  6. Health check — built-in HEALTHCHECK instruction. Don't rely solely on Kubernetes probes — Docker-level health checks work in ECS, Compose, and standalone containers too
  7. No secrets in the image — use environment variables or secrets managers (AWS Secrets Manager, Vault). Never COPY .env
  8. Cache cleanupnpm cache clean --force after install to reduce layer size

These aren't optimizations. They're baseline requirements for any container running in production. The difference between a team that ships 200MB images and one shipping 1.2GB images isn't skill — it's whether they have a standard Dockerfile template that enforces these patterns.

For teams managing cloud costs seriously, container optimization is one piece of the puzzle. The real savings come from understanding where your compute spend goes at a resource level — which requires visibility tools that go beyond basic dashboard metrics and into peak pattern analysis.

See where your cloud spend actually goes

CLARITY provides resource-level cost attribution across AWS, Azure, and GCP — including container workloads.

Try CLARITY Free Or request a free cloud cost audit

Did you find this article useful?