Most Dockerfiles in production are doing it wrong. They copy everything, install everything, and ship everything — including build tools, devDependencies, and source code that the running application never needs.
The result: images that are 800MB–1.2GB when they could be 150–200MB. That difference compounds across every pull, every deploy, every node in your cluster.
The Problem with Standard Dockerfiles
Here's a Dockerfile that works but costs you time and money on every deployment:
FROM node:20
WORKDIR /app
COPY . .
RUN npm install
RUN npm run build
EXPOSE 5000
CMD ["npm", "start"]
This image includes:
- Full Node.js OS layer (~350MB) — includes apt, gcc, python, and system tools your app doesn't need
- All devDependencies (~200MB) — TypeScript, ESLint, testing frameworks, build tools
- Source code (~50MB) — TypeScript files, test files, config files that the compiled app doesn't reference
- npm cache (~100MB) — cached packages that served their purpose during install
Total: ~900MB. But the actual runtime needs are just the compiled JavaScript, production dependencies, and a minimal Node.js binary — roughly 180MB.
Worse: Docker layer caching breaks on every code change. Because COPY . . comes before npm install, any file change — even a one-line fix — invalidates the dependency cache and forces a full reinstall. That's 30–90 seconds wasted on every build.
How Multi-Stage Builds Work
Docker multi-stage builds solve both problems with one concept: separate build-time concerns from runtime needs.
Each FROM instruction starts a new stage. You can copy artifacts from earlier stages into later ones, discarding everything else. The final image only contains what the last stage includes.
The simplest optimization — fixing the layer cache:
# Stage 1: Install dependencies (cached unless package.json changes)
FROM node:20-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --prefer-offline --no-audit
# Stage 2: Build application
FROM node:20-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN npm run build
# Stage 3: Production image
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY package*.json ./
RUN npm ci --omit=dev --prefer-offline --no-audit
EXPOSE 5000
CMD ["node", "dist/index.js"]
Key improvements:
- Layer caching works:
package*.jsonis copied first. Dependencies are only reinstalled when the lockfile changes — not on every code edit - Alpine base: ~50MB vs ~350MB for the full Node.js image
- Production-only deps:
--omit=devstrips TypeScript, testing tools, and build utilities from the final image - No source code: Only compiled
dist/is copied to the final stage
The 4-Stage Production Pattern
For production workloads, we use a 4-stage pattern that adds dependency isolation and security hardening:
# Stage 1: Dependencies (cached layer)
FROM node:20-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --prefer-offline --no-audit
# Stage 2: Build
FROM node:20-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY package*.json ./
COPY . .
RUN npm run build
# Stage 3: Production dependencies only
FROM node:20-alpine AS prod-deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev --prefer-offline --no-audit
# Stage 4: Final production image
FROM node:20-alpine
WORKDIR /app
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001
COPY --from=prod-deps --chown=nodejs:nodejs /app/node_modules ./node_modules
COPY --chown=nodejs:nodejs package*.json ./
COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist
USER nodejs
EXPOSE 5000
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
CMD node -e "require('http').get('http://localhost:5000/api/health', (r) => {process.exit(r.statusCode === 200 ? 0 : 1)})"
CMD ["node", "dist/index.js"]
Why 4 stages instead of 3:
- Stage 1 (deps) installs ALL dependencies including devDependencies — needed for building TypeScript, running bundlers, etc.
- Stage 3 (prod-deps) installs ONLY production dependencies in a clean environment — no build artifacts leak in
- Both stages cache independently. A code change triggers a rebuild (Stage 2) but neither dependency stage re-executes unless
package.jsonchanges
Security: Non-Root by Default
Notice the adduser and USER nodejs directives. The application runs as UID 1001, not root.
This matters because:
- Container escape attacks are significantly harder without root privileges
- Kubernetes PodSecurityPolicies (and their successor, Pod Security Standards) can enforce
runAsNonRoot: true - Compliance frameworks (SOC 2, ISO 27001, NIST) require least-privilege execution
- ECS Fargate respects the USER directive — your task runs as the specified user, not root
The --chown=nodejs:nodejs flag on COPY ensures all files are owned by the non-root user, preventing permission issues at runtime.
Special Cases: Native Modules
Some npm packages (like grpc, sharp, or bcrypt) require native compilation. Alpine doesn't include build tools by default, so you need to handle this:
# Build stage: install native build tools
FROM node:20-alpine AS builder
RUN apk add --no-cache python3 make g++
COPY package*.json ./
RUN npm ci --prefer-offline --no-audit
COPY tsconfig.json ./
COPY src ./src
RUN npm run build
# Production stage: install native deps, then clean up
FROM node:20-alpine
RUN apk add --no-cache libstdc++
COPY package*.json ./
RUN apk add --no-cache --virtual .build-deps python3 make g++ && \
npm ci --omit=dev --prefer-offline --no-audit && \
npm cache clean --force && \
apk del .build-deps
COPY --from=builder /app/dist ./dist
The key pattern: install build tools as virtual packages (--virtual .build-deps), compile native modules, then delete the build tools (apk del .build-deps) in the same RUN layer. This keeps the final image small while still producing working native binaries.
The Cloud Cost Impact
Image size isn't vanity — it directly affects your cloud bill:
- ECR/ACR/GCR storage: Smaller images = lower registry storage costs. At scale with dozens of tags and services, this adds up
- Pull time: A 180MB image pulls in ~8 seconds. A 1.2GB image takes 40–60 seconds. In Kubernetes, this is the difference between a 15-second rolling update and a 2-minute one
- Fargate pricing: AWS Fargate charges for the pull duration before the task is running. Faster pulls = lower startup costs
- Auto-scaling responsiveness: When a traffic spike triggers a scale-out, smaller images mean new containers serve traffic faster. The gap between "scaling event" and "handling requests" shrinks from minutes to seconds
- Network transfer: Every node pull transfers the image across the network. At 50 deploys/week across 10 nodes, the difference between 180MB and 1.2GB is ~500GB/month in data transfer
For organizations running dozens of microservices across multiple environments, optimized Dockerfiles can reduce container-related costs by 15–25%. When combined with proper Kubernetes cost allocation, the savings compound further because you can right-size pods based on actual requirements rather than bloated image overhead.
Track your container costs at the namespace level
CLARITY breaks down Kubernetes costs by cluster, namespace, and deployment across EKS, AKS, and GKE.
Start Free TrialProduction Checklist
Before shipping any Dockerfile to production, verify these 8 points:
- Alpine base image — use
node:20-alpine,python:3.12-slim, or distroless. Never use the full OS image in production - Package files copied first —
COPY package*.json ./beforeCOPY . .to preserve layer caching - npm ci, not npm install —
ciuses the lockfile exactly, producing deterministic builds.installcan modify the lockfile - Production-only dependencies —
--omit=devin the final stage. No TypeScript compiler in production - Non-root user — create a dedicated user and switch to it with
USERbefore the CMD - Health check — built-in HEALTHCHECK instruction. Don't rely solely on Kubernetes probes — Docker-level health checks work in ECS, Compose, and standalone containers too
- No secrets in the image — use environment variables or secrets managers (AWS Secrets Manager, Vault). Never
COPY .env - Cache cleanup —
npm cache clean --forceafter install to reduce layer size
These aren't optimizations. They're baseline requirements for any container running in production. The difference between a team that ships 200MB images and one shipping 1.2GB images isn't skill — it's whether they have a standard Dockerfile template that enforces these patterns.
For teams managing cloud costs seriously, container optimization is one piece of the puzzle. The real savings come from understanding where your compute spend goes at a resource level — which requires visibility tools that go beyond basic dashboard metrics and into peak pattern analysis.
See where your cloud spend actually goes
CLARITY provides resource-level cost attribution across AWS, Azure, and GCP — including container workloads.
Try CLARITY Free Or request a free cloud cost auditDid you find this article useful?