Kubernetes Blue-Green Deployment

Blue-green deployment maintains two complete production environments - one serving live traffic (blue), one idle or being prepared (green). Traffic switches instantly between them via Service selector changes, enabling instant rollback and eliminating the version coexistence challenges of rolling deployments.

Operational Pattern

Parallel Environments - Instead of a single Deployment that updates in place, blue-green uses two separate Deployments. At any time, one is “active” (receiving production traffic) and the other is “inactive” (either idle or being updated).

Traffic Switching - A Service points to the active environment via label selectors. To switch traffic, you update the Service selector from version: blue to version: green. This change is atomic from the Service’s perspective - all new connections go to the new environment.

Instant Rollback - If issues emerge with the green environment, switching back to blue is identical to the original cutover: change the Service selector back. No waiting for gradual redeployment or complex rollback orchestration.

Implementation Approach

Dual Deployments - Create two Deployments with different version labels:

# Blue Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: blue
  template:
    metadata:
      labels:
        app: myapp
        version: blue
    spec:
      containers:
      - name: app
        image: myapp:v1
---
# Green Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: green
  template:
    metadata:
      labels:
        app: myapp
        version: green
    spec:
      containers:
      - name: app
        image: myapp:v2

Service Selector - The Service initially points to blue:

apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp
    version: blue  # Currently routing to blue
  ports:
  - port: 80

Traffic Cutover - To switch to green, update the Service:

kubectl patch service myapp -p '{"spec":{"selector":{"version":"green"}}}'

Existing connections to blue Pods continue until they naturally close. New connections go to green Pods.

Trade-offs and Costs

Resource Overhead - Blue-green requires maintaining two complete environments. During the transition period, you need 2x capacity - both blue and green Deployments running full replica counts.

For applications with large resource footprints or in resource-constrained clusters, this doubling of resources may be prohibitive. This contrasts with rolling deployment, which needs only modest excess capacity (controlled by maxSurge).

Environment Parity - The inactive environment must accurately represent production to make testing meaningful. However, it’s not receiving real traffic, so subtle issues (race conditions under load, edge cases in production data patterns) may not surface until cutover.

Stateful Applications - Blue-green works elegantly for stateless applications but becomes complex with state:

Database migrations must support both versions during cutover
Shared state (caches, session stores) must be version-neutral
File uploads, job queues, and other persistent artifacts need handling

For complex stateful scenarios, persistent volumes and careful schema design are critical.

Resource Cleanup - After a successful green deployment, the blue environment should be scaled down or removed. However, keeping it around (at reduced replica count) maintains fast rollback capability. Balance resource costs against rollback speed requirements.

Testing in Green

One of blue-green’s key advantages is testing the new environment before cutover:

Pre-Production Validation - While green is inactive, you can:

Run smoke tests against green Pods directly (via Pod IPs or temporary Service)
Execute integration tests with production-like data
Load test to verify performance characteristics
Validate monitoring and alerting configurations

Canary Testing - Before full cutover, send a small percentage of production traffic to green. This requires additional tooling (service mesh, ingress controller with traffic splitting) but provides real-world validation before committing.

Gradual Cutover - Rather than instant 100% switch, some implementations gradually shift the Service selector, though this requires custom controllers or external tooling beyond basic Kubernetes primitives.

Rollback Capabilities

Instant Rollback - If green exhibits problems after cutover, switching back to blue is immediate:

kubectl patch service myapp -p '{"spec":{"selector":{"version":"blue"}}}'

New connections immediately go to blue. No waiting for Pods to restart or redeploy.

Time Window - How long can you maintain rollback capability? It depends on state changes:

For stateless apps: indefinitely (until you remove blue)
For apps with database writes: depends on schema compatibility
For apps with irreversible state changes: rollback window may be short

State Synchronization - If green has made incompatible state changes (database writes, cache updates), rolling back to blue may be impossible or require manual state cleanup. Design for backward compatibility or plan one-way migrations.

Relationship to Kubernetes Primitives

Blue-green deployment is a pattern built on basic Kubernetes resources, not a distinct resource type:

Deployments - Uses two standard Deployments with different labels. Each can use rolling updates internally when updating blue or green environments.

Services - Leverages Services’ label selector mechanism. The Service abstraction already provides load balancing and discovery; blue-green just changes which Pods it selects.

Labels - Entirely dependent on label selectors. The version label (blue/green) is the discriminator that routes traffic.

This pattern demonstrates Kubernetes’ compositional nature - sophisticated deployment strategies emerge from composing simple primitives.

Advanced Tooling

While implementable with basic kubectl commands, production blue-green often uses higher-level tools:

Flagger - Automates blue-green (and canary) deployments with progressive traffic shifting, metric-based gates, and automatic rollback.

Argo Rollouts - Provides a Rollout CRD extending Deployments with native blue-green support, including traffic management and analysis.

Service Meshes - Istio, Linkerd, and similar tools enable sophisticated traffic splitting, allowing gradual percentage-based cutover rather than instant switches.

Ingress Controllers - Some ingress controllers (NGINX, Traefik) support weighted traffic routing at the ingress layer, enabling blue-green with external traffic.

Comparison to Other Strategies

vs Rolling Deployment - Rolling requires less capacity but has version coexistence. Blue-green needs 2x capacity but provides instant cutover and rollback.

vs Recreate - Recreate has downtime but is simplest. Blue-green has no downtime but requires managing parallel environments.

vs Canary - Canary gradually shifts traffic with metric validation. Blue-green is simpler (binary switch) but less progressive.

Gradual Notes

Recent Writing

Revisited

Space is Not Barrenness

Study the Canon

Recent Notes

Caching Context

Kubernetes Batch Jobs

Sidekiq Architecture

Sidekiq Capsules