Kubernetes Automated Placement

Automated Placement is a foundational pattern in Kubernetes patterns that addresses how Pods are intelligently assigned to nodes in a cluster. The Kubernetes scheduler solves the complex problem of matching container resource demands with available node capacity while honoring policies and constraints.

The Problem

As systems adopt cloud-native architecture and decompose into microservices, the number of deployable units grows dramatically. Manually assigning thousands of containers and Pods to nodes becomes impractical and error-prone.

Effective placement decisions are critical because they directly impact:

Availability - Poor placement creates single points of failure
Performance - Resource contention and network latency depend on placement
Capacity utilization - Inefficient packing wastes cluster resources
Cost - Over-provisioning to compensate for poor placement increases infrastructure costs

The Solution: Kubernetes Scheduler

The Kubernetes scheduler provides automated, policy-driven Pod placement by actively monitoring unscheduled Pods and binding them to suitable nodes. The scheduler evaluates resource profiles, node capacity, and placement policies to find optimal node assignments.

Unlike static placement, the scheduler handles:

Initial placement when Pods are first created
Scale-up as new Pods are added during horizontal scaling
Rescheduling when moving workloads from unhealthy or overloaded nodes
Constraint satisfaction ensuring compliance with affinity, anti-affinity, and taint rules

Scheduling Prerequisites

Effective automated placement requires several prerequisites:

Available Node Resources

Nodes must have sufficient capacity after accounting for system reservations. Node capacity is reduced by:

Kube-Reserved - Resources reserved for Kubernetes system daemons (kubelet, container runtime)

System-Reserved - Resources for operating system daemons (sshd, systemd)

Eviction Thresholds - Memory buffer reserved to prevent system Out-of-Memory events that would crash the entire node

Only the remaining allocatable capacity is available for scheduling Pods.

Container Resource Demands

Pods must declare their resource requirements through requests and limits. The scheduler uses resource requests to determine Pod placement, ensuring nodes have adequate capacity.

This ties directly to runtime dependencies - properly declaring resource needs enables the scheduler to make informed decisions and prevents resource contention that degrades performance.

Scheduler Configuration

The scheduling process operates in phases using scheduling profiles and plugins:

Filtering (Predicates) - Identify feasible nodes that meet minimum requirements. A node is filtered out if it lacks sufficient resources, violates taints, or fails other hard constraints.

Scoring (Priorities) - Rank feasible nodes by desirability. Scoring functions consider factors like resource balance, Pod affinity, and topology spread to find the optimal placement.

The highest-scoring feasible node is selected for Pod binding.

Node Selection Mechanisms

While automated placement works well by default, several mechanisms allow expressing placement preferences and requirements:

Node Selector

The simplest placement constraint - specify labels that must exist on target nodes. The Pod is eligible only for nodes matching all specified label key-value pairs.

nodeSelector:
  disktype: ssd
  zone: us-west-1a

Node selectors provide basic node targeting but lack expressiveness for complex requirements.

Node Affinity

A more powerful alternative to node selectors, supporting advanced operators and preference levels. Node affinity rules express constraints about node labels with operators like In, NotIn, Exists, DoesNotExist, and Gt (greater than).

Required rules are hard constraints - Pods won’t schedule if no nodes satisfy them.

Preferred rules are soft constraints - they increase node scoring but don’t prevent scheduling.

This flexibility enables expressing requirements like “prefer SSD nodes in zone A, but accept HDD nodes if necessary” - crucial for balancing performance goals with scheduling success.

Pod Affinity and Anti-Affinity

These rules base placement decisions on other Pods already running on nodes, enabling:

Pod Affinity (Colocation) - Schedule Pods near related Pods to minimize network latency. For example, placing web frontend Pods on nodes already running cache Pods reduces round-trip times.

Pod Anti-Affinity (Spreading) - Schedule Pods away from each other to avoid single points of failure. Spreading replicas across failure domains (zones, racks, nodes) ensures that infrastructure failures don’t take down all instances simultaneously.

Pod affinity and anti-affinity are essential for building high-availability topologies. They work with label selectors to identify related Pods and topology keys to define failure domains.

Topology Spread Constraints

Topology spread constraints ensure even Pod distribution across topology domains like zones, nodes, or racks. This achieves:

Better cluster utilization - Avoiding hotspots where some nodes are packed while others sit idle

High availability - Distributing replicas across failure domains so that zone or node failures don’t eliminate all instances

Performance consistency - Preventing resource contention from over-packing nodes

Spread constraints complement Pod anti-affinity by providing fine-grained control over distribution patterns.

Taints and Tolerations

This mechanism reverses the control flow - instead of Pods selecting nodes (like affinity), nodes control which Pods they accept. Nodes can have taints that repel Pods unless the Pod has a matching toleration.

NoSchedule - Hard constraint preventing scheduling without matching toleration

PreferNoSchedule - Soft constraint that discourages but doesn’t prevent scheduling

NoExecute - Evicts already-running Pods that lack the toleration

Taints are useful for:

Dedicated nodes - Reserve nodes for specific workloads (GPU nodes for ML jobs)
Node problems - Automatically taint unhealthy nodes to prevent new placements
Special hardware - Control access to nodes with specialized capabilities

Tolerations integrate with Pod priority and QoS classes - high-priority Pods might tolerate taints that exclude lower-priority workloads.

Descheduler

While the scheduler handles initial placement, the descheduler improves cluster utilization by evicting and rescheduling poorly placed Pods. This optional component defragments clusters by:

Evicting Pods with poor QoS classes from overloaded nodes
Moving low priority Pods to free capacity for higher-priority workloads
Rebalancing after node capacity changes or policy updates

The descheduler ensures placement remains optimal as cluster state evolves, complementing the scheduler’s initial decisions.

Best Practices

Start with resource profiles - Accurate resource requests and limits enable resource-consumption-driven scheduling, the foundation of effective placement.

Label strategically - Apply labels describing node properties (hardware type, zone, environment) and Pod characteristics (application tier, version) to enable flexible selection.

Use constraints sparingly - Let the scheduler work automatically when possible. Add affinity, anti-affinity, and taints only for specific requirements like data locality or high availability.

Balance requirements and preferences - Use required rules for hard constraints (security boundaries, hardware requirements) and preferred rules for optimization (performance, cost).

Test scheduling policies - Verify that Pods can schedule under various failure scenarios. Overly restrictive policies may prevent scheduling when nodes fail or during maintenance.

Integration with Other Patterns

Automated Placement interacts with other foundational patterns:

Managed Lifecycle - Proper SIGTERM handling and PreStop hooks enable graceful Pod eviction during rescheduling operations.

Rolling Deployment - Placement policies affect how update Pods are distributed, influencing availability during deployments.

ResourceQuota and LimitRange** - Namespace-level resource governance works with scheduler placement to control total cluster consumption.

By combining automated placement with proper resource declaration and strategic use of placement policies, Kubernetes efficiently manages the complexity of distributed system topology while maintaining application performance and availability requirements.

Gradual Notes

Recent Writing

Revisited

Space is Not Barrenness

Study the Canon

Recent Notes

Caching Context

Kubernetes Batch Jobs

Sidekiq Architecture

Sidekiq Capsules