Docker Multi-Stage Builds: From 1.2GB to 180MB in Production

Most Dockerfiles in production are doing it wrong. They copy everything, install everything, and ship everything — including build tools, devDependencies, and source code that the running application never needs.

The result: images that are 800MB–1.2GB when they could be 150–200MB. That difference compounds across every pull, every deploy, every node in your cluster.

The Problem with Standard Dockerfiles

Here's a Dockerfile that works but costs you time and money on every deployment:

FROM node:20
WORKDIR /app
COPY . .
RUN npm install
RUN npm run build
EXPOSE 5000
CMD ["npm", "start"]

This image includes:

Full Node.js OS layer (~350MB) — includes apt, gcc, python, and system tools your app doesn't need
All devDependencies (~200MB) — TypeScript, ESLint, testing frameworks, build tools
Source code (~50MB) — TypeScript files, test files, config files that the compiled app doesn't reference
npm cache (~100MB) — cached packages that served their purpose during install

Total: ~900MB. But the actual runtime needs are just the compiled JavaScript, production dependencies, and a minimal Node.js binary — roughly 180MB.

Worse: Docker layer caching breaks on every code change. Because COPY . . comes before npm install, any file change — even a one-line fix — invalidates the dependency cache and forces a full reinstall. That's 30–90 seconds wasted on every build.

How Multi-Stage Builds Work

Docker multi-stage builds solve both problems with one concept: separate build-time concerns from runtime needs.

Each FROM instruction starts a new stage. You can copy artifacts from earlier stages into later ones, discarding everything else. The final image only contains what the last stage includes.

The simplest optimization — fixing the layer cache:

# Stage 1: Install dependencies (cached unless package.json changes)
FROM node:20-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --prefer-offline --no-audit

# Stage 2: Build application
FROM node:20-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN npm run build

# Stage 3: Production image
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY package*.json ./
RUN npm ci --omit=dev --prefer-offline --no-audit
EXPOSE 5000
CMD ["node", "dist/index.js"]

Key improvements:

Layer caching works: package*.json is copied first. Dependencies are only reinstalled when the lockfile changes — not on every code edit
Alpine base: ~50MB vs ~350MB for the full Node.js image
Production-only deps: --omit=dev strips TypeScript, testing tools, and build utilities from the final image
No source code: Only compiled dist/ is copied to the final stage

The 4-Stage Production Pattern

For production workloads, we use a 4-stage pattern that adds dependency isolation and security hardening:

# Stage 1: Dependencies (cached layer)
FROM node:20-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --prefer-offline --no-audit

# Stage 2: Build
FROM node:20-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY package*.json ./
COPY . .
RUN npm run build

# Stage 3: Production dependencies only
FROM node:20-alpine AS prod-deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev --prefer-offline --no-audit

# Stage 4: Final production image
FROM node:20-alpine
WORKDIR /app

RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001

COPY --from=prod-deps --chown=nodejs:nodejs /app/node_modules ./node_modules
COPY --chown=nodejs:nodejs package*.json ./
COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist

USER nodejs
EXPOSE 5000

HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
  CMD node -e "require('http').get('http://localhost:5000/api/health', (r) => {process.exit(r.statusCode === 200 ? 0 : 1)})"

CMD ["node", "dist/index.js"]

Why 4 stages instead of 3:

Stage 1 (deps) installs ALL dependencies including devDependencies — needed for building TypeScript, running bundlers, etc.
Stage 3 (prod-deps) installs ONLY production dependencies in a clean environment — no build artifacts leak in
Both stages cache independently. A code change triggers a rebuild (Stage 2) but neither dependency stage re-executes unless package.json changes

Security: Non-Root by Default

Notice the adduser and USER nodejs directives. The application runs as UID 1001, not root.

This matters because:

Container escape attacks are significantly harder without root privileges
Kubernetes PodSecurityPolicies (and their successor, Pod Security Standards) can enforce runAsNonRoot: true
Compliance frameworks (SOC 2, ISO 27001, NIST) require least-privilege execution
ECS Fargate respects the USER directive — your task runs as the specified user, not root

The --chown=nodejs:nodejs flag on COPY ensures all files are owned by the non-root user, preventing permission issues at runtime.

Special Cases: Native Modules

Some npm packages (like grpc, sharp, or bcrypt) require native compilation. Alpine doesn't include build tools by default, so you need to handle this:

# Build stage: install native build tools
FROM node:20-alpine AS builder
RUN apk add --no-cache python3 make g++
COPY package*.json ./
RUN npm ci --prefer-offline --no-audit
COPY tsconfig.json ./
COPY src ./src
RUN npm run build

# Production stage: install native deps, then clean up
FROM node:20-alpine
RUN apk add --no-cache libstdc++
COPY package*.json ./
RUN apk add --no-cache --virtual .build-deps python3 make g++ && \
    npm ci --omit=dev --prefer-offline --no-audit && \
    npm cache clean --force && \
    apk del .build-deps
COPY --from=builder /app/dist ./dist

The key pattern: install build tools as virtual packages (--virtual .build-deps), compile native modules, then delete the build tools (apk del .build-deps) in the same RUN layer. This keeps the final image small while still producing working native binaries.

The Cloud Cost Impact

Image size isn't vanity — it directly affects your cloud bill:

ECR/ACR/GCR storage: Smaller images = lower registry storage costs. At scale with dozens of tags and services, this adds up
Pull time: A 180MB image pulls in ~8 seconds. A 1.2GB image takes 40–60 seconds. In Kubernetes, this is the difference between a 15-second rolling update and a 2-minute one
Fargate pricing: AWS Fargate charges for the pull duration before the task is running. Faster pulls = lower startup costs
Auto-scaling responsiveness: When a traffic spike triggers a scale-out, smaller images mean new containers serve traffic faster. The gap between "scaling event" and "handling requests" shrinks from minutes to seconds
Network transfer: Every node pull transfers the image across the network. At 50 deploys/week across 10 nodes, the difference between 180MB and 1.2GB is ~500GB/month in data transfer

For organizations running dozens of microservices across multiple environments, optimized Dockerfiles can reduce container-related costs by 15–25%. When combined with proper Kubernetes cost allocation, the savings compound further because you can right-size pods based on actual requirements rather than bloated image overhead.

Track your container costs at the namespace level

CLARITY breaks down Kubernetes costs by cluster, namespace, and deployment across EKS, AKS, and GKE.

Start Free Trial

Production Checklist

Before shipping any Dockerfile to production, verify these 8 points:

Alpine base image — use node:20-alpine, python:3.12-slim, or distroless. Never use the full OS image in production
Package files copied first — COPY package*.json ./ before COPY . . to preserve layer caching
npm ci, not npm install — ci uses the lockfile exactly, producing deterministic builds. install can modify the lockfile
Production-only dependencies — --omit=dev in the final stage. No TypeScript compiler in production
Non-root user — create a dedicated user and switch to it with USER before the CMD
Health check — built-in HEALTHCHECK instruction. Don't rely solely on Kubernetes probes — Docker-level health checks work in ECS, Compose, and standalone containers too
No secrets in the image — use environment variables or secrets managers (AWS Secrets Manager, Vault). Never COPY .env
Cache cleanup — npm cache clean --force after install to reduce layer size

These aren't optimizations. They're baseline requirements for any container running in production. The difference between a team that ships 200MB images and one shipping 1.2GB images isn't skill — it's whether they have a standard Dockerfile template that enforces these patterns.

For teams managing cloud costs seriously, container optimization is one piece of the puzzle. The real savings come from understanding where your compute spend goes at a resource level — which requires visibility tools that go beyond basic dashboard metrics and into peak pattern analysis.

See where your cloud spend actually goes

CLARITY provides resource-level cost attribution across AWS, Azure, and GCP — including container workloads.

Try CLARITY Free Or request a free cloud cost audit

Did you find this article useful?