Kubernetes SIGKILL

SIGKILL (Signal 9) is Kubernetes’ forceful termination signal sent to container processes that haven’t exited after the graceful shutdown period. Unlike SIGTERM which requests clean shutdown, SIGKILL immediately and unconditionally terminates processes without any opportunity for cleanup.

Termination Sequence

SIGKILL represents the final step in Kubernetes’ managed lifecycle termination sequence:

PreStop Hook executes if configured
SIGTERM sent to allow graceful shutdown
Grace Period waits for .spec.terminationGracePeriodSeconds (default 30 seconds)
SIGKILL forcefully terminates if process still running

SIGKILL serves as the safety net ensuring Pods always terminate, even when applications misbehave or hang during shutdown.

Signal Characteristics

Unblockable - SIGKILL cannot be caught, ignored, or handled by applications. The kernel terminates the process immediately upon signal receipt.

No Cleanup - Applications receive no notification and have no opportunity to:

Complete in-flight requests
Close database connections
Flush buffers to disk
Release locks
Clean up temporary files

Immediate Termination - The process is removed from the system instantly. Any resources it held are forcibly reclaimed by the kernel.

This abruptness makes SIGKILL a last resort, not a normal shutdown mechanism.

Consequences of Forced Termination

When containers are SIGKILL’d:

Data Loss - Uncommitted transactions are lost. Unflushed buffers don’t make it to disk. In-memory state disappears without persistence.

Resource Leaks - Database connections remain open until timeout, locks held in distributed systems stay locked until TTL expires, message queue messages may be re-delivered, and temporary files accumulate.

Request Failures - Active requests being processed fail immediately, resulting in dropped requests if SIGTERM wasn’t handled properly.

Corruption Risk - Files being written may be left in inconsistent states. Databases mid-transaction may have partial updates.

When SIGKILL Occurs

Several scenarios trigger SIGKILL:

Grace Period Expiration - Most common case. Application didn’t exit within terminationGracePeriodSeconds after receiving SIGTERM.

spec:
  terminationGracePeriodSeconds: 30  # SIGKILL after 30 seconds
  containers:
  - name: app
    image: myapp:v1

Force Delete - Operator forces immediate deletion:

kubectl delete pod mypod --force --grace-period=0

This bypasses graceful shutdown entirely - SIGKILL is sent immediately. Use only in emergencies.

Liveness Probe Failures - Container fails liveness probes and needs restarting. Kubernetes sends SIGTERM, then SIGKILL after grace period.

Node Pressure - Kubelet evicts Pods during resource pressure (out of memory, disk pressure). Pods are SIGTERM’d then SIGKILL’d if they don’t exit.

Prevention Strategies

Avoiding SIGKILL requires proper lifecycle implementation:

Implement SIGTERM Handling

The primary defense is handling SIGTERM gracefully:

func main() {
    sigterm := make(chan os.Signal, 1)
    signal.Notify(sigterm, syscall.SIGTERM)
 
    go func() {
        <-sigterm
        log.Println("Received SIGTERM, shutting down")
 
        // Graceful shutdown with timeout less than grace period
        ctx, cancel := context.WithTimeout(context.Background(), 25*time.Second)
        defer cancel()
 
        if err := server.Shutdown(ctx); err != nil {
            log.Printf("Shutdown error: %v", err)
            os.Exit(1)
        }
 
        os.Exit(0)
    }()
 
    server.ListenAndServe()
}

Exit before the grace period expires to avoid SIGKILL.

Adjust Grace Periods

For applications with longer shutdown requirements:

spec:
  terminationGracePeriodSeconds: 90  # Extend for long transactions
  containers:
  - name: batch-processor
    image: processor:v1

Balance giving sufficient time against prolonged Pod termination during deployments.

Use PreStop Hooks Effectively

PreStop hooks can help applications prepare for shutdown:

lifecycle:
  preStop:
    exec:
      command: ["/bin/sh", "-c", "sleep 5"]  # Allow connection draining

This delay helps connections drain before SIGTERM, reducing the work needed during shutdown.

Monitor Shutdown Behavior

Track whether containers are being SIGKILL’d:

# Check container termination reasons
kubectl describe pod mypod | grep -A 5 "Last State"

Look for Terminated: Signal 9 indicating SIGKILL. This reveals shutdown problems.

Handling Inevitable SIGKILL

Some scenarios make SIGKILL unavoidable:

Infinite Loops - Application bugs causing hangs will eventually be SIGKILL’d. This is desirable - it’s better than hanging forever.

Force Delete - Operators sometimes must force delete stuck resources. Accept that cleanup may not occur.

Out of Memory - OOMKilled Pods are SIGKILL’d immediately by the kernel, bypassing Kubernetes’ graceful shutdown. No grace period is possible.

For these cases, build resilience:

Idempotent Operations - Design operations to tolerate being killed mid-execution and restarted.

External Cleanup - Use separate cleanup jobs to find and remove abandoned resources.

Distributed Locks with TTL - Ensure locks auto-release even if holder is SIGKILL’d.

Transaction Isolation - Rely on database transaction isolation to handle abrupt termination.

Integration with Deployment Patterns

SIGKILL impacts deployment strategies:

Rolling Deployment - If new Pods aren’t becoming ready and old Pods hit grace period limits, multiple Pods may be SIGKILL’d, potentially causing service degradation.

Recreate Deployment - All Pods are terminated simultaneously. If they all hit grace period limits, mass SIGKILL occurs. Ensure grace periods accommodate shutdown needs.

Node Draining - During cluster maintenance, nodes are drained. Pods get the grace period to move, then are SIGKILL’d if they don’t exit.

Observability

Detecting SIGKILL provides valuable operational insight:

Container Exit Codes - Exit code 137 (128 + 9) indicates SIGKILL:

kubectl get pods -o jsonpath='{.items[*].status.containerStatuses[*].lastState.terminated.exitCode}'

Termination Reasons:

kubectl get pods -o jsonpath='{.items[*].status.containerStatuses[*].lastState.terminated.reason}'

Look for Error, OOMKilled, or explicit SIGKILL mentions.

Metrics - Track SIGKILL frequency:

# Container terminations by signal
kube_pod_container_status_last_terminated_reason{reason="Error",signal="9"}

High SIGKILL rates indicate shutdown problems requiring investigation.

Best Practices

Never Rely on SIGKILL - Design applications assuming graceful shutdown will succeed. SIGKILL should be the exception, not the norm.

Test Shutdown Timing - Measure actual shutdown duration and set grace periods accordingly:

time kubectl exec mypod -- kill -TERM 1

Handle SIGTERM Properly - The only reliable way to avoid SIGKILL consequences is handling SIGTERM correctly. There’s no workaround for proper signal handling.

Monitor Container Exits - Alert on frequent SIGKILL occurrences. They indicate either application problems or misconfigured grace periods.

Document Shutdown Requirements - Make shutdown time requirements explicit in runbooks. This helps operators configure appropriate grace periods.

Build Resilience - Even with perfect SIGTERM handling, SIGKILL can occur (OOM, force delete, kernel panics). Design for graceful degradation when it happens.

SIGKILL is Kubernetes’ safety mechanism ensuring Pods always terminate. By understanding when and why it occurs, you can design applications that minimize its impact while accepting it as an inevitable part of managed lifecycle patterns.

Gradual Notes

Recent Writing

Revisited

Space is Not Barrenness

Study the Canon

Recent Notes

Caching Context

Kubernetes Batch Jobs

Sidekiq Architecture

Sidekiq Capsules