Kubernetes is a sophisticated distributed system that demonstrates the core principles of modern container orchestration at scale. The platform implements fundamental distributed-systems concepts β coordination, consistency, fault tolerance β through a carefully designed architecture that separates control logic from execution.
When you run kubectl apply -f deployment.yaml, you trigger a coordinated sequence across multiple distributed components β each implementing specific distributed-systems patterns.
Two Planes
Kubernetes splits cleanly into a control plane (decision-making) and a data plane (execution).
Control plane
The brain
Runs on master nodes. Manages cluster state. API server, etcd (state store), scheduler, controller managers. Stateless except for etcd.
This split enables horizontal scaling, fault tolerance, and clear separation of concerns β the same architectural principles youβd use for any large distributed system.
Control Plane Components
π
kube-apiserver
The single source of truth interface. All cluster state changes go through here. Validates, authenticates, persists to etcd. Horizontally scalable.
ποΈ
etcd
Distributed KV store backed by Raft consensus. Holds all cluster state. The reliability of etcd determines the reliability of Kubernetes.
π
kube-scheduler
Decides which node a new Pod should run on. Considers resource requests, affinity rules, taints/tolerations. Pluggable.
π
kube-controller-manager
Runs the reconciliation loops (Deployment controller, ReplicaSet controller, Node controller, etc.). Watches state, makes it match desired state.
βοΈ
cloud-controller-manager
Cloud-specific controllers (Load balancers, persistent volumes, node lifecycle). Lets the rest of K8s stay cloud-agnostic.
Data Plane Components
π€
kubelet
Node agent. Watches the API server for Pods assigned to its node. Calls the container runtime to start/stop containers. Reports node and Pod status back.
π
kube-proxy
Service networking on each node. Maintains iptables/IPVS rules to route service traffic to backend Pods. Or replaced by service-mesh proxies (Cilium, Linkerd).
π¦
Container runtime
Actually runs containers. containerd (the default), CRI-O. Implements the Container Runtime Interface (CRI) so K8s doesn't care which runtime you use.
The Reconciliation Loop
The pattern that makes Kubernetes work: declare desired state, the system continuously moves actual state toward it.
1. User declares desired state (Deployment YAML).2. Controller watches the API for changes.3. Controller compares desired vs. actual state.4. Controller takes action to reduce the difference.5. Loop forever.
Scheduling
When a Pod is created, the scheduler picks a node. The decision involves:
π
Resource requests
Pod says "I need 100m CPU and 256Mi memory." Scheduler finds nodes with enough free capacity.
π·οΈ
Node selectors
Pin pods to nodes with specific labels (e.g., GPU nodes, region-specific nodes).
π€
Affinity / anti-affinity
Pods that should run together (data + worker) or apart (replicas of the same service).
π§
Taints and tolerations
Nodes can repel pods unless the pod explicitly tolerates the taint. Used for dedicated node pools.
βοΈ
Topology spread
Distribute pod replicas across zones for HA.
π
Custom plugins
Scheduler is pluggable. Custom scoring functions for specialized needs.
Workload Resources
A few of the most-used resource types:
π¦
Pod
Smallest deployable unit. One or more containers that share network and storage. Usually managed by higher-level resources.
π―
Deployment
Manages a stateless ReplicaSet. Provides rolling updates and rollback. The default for stateless services.
One pod per node (or matching node selector). Used for log collectors, monitoring agents, network plugins.
β°
Job / CronJob
Run-to-completion workloads. Job for one-off; CronJob for scheduled.
ποΈ
HPA / VPA
Horizontal Pod Autoscaler (more replicas) and Vertical Pod Autoscaler (more resources per pod). Auto-scaling based on metrics.
Service and Networking
Pods are ephemeral and have changing IPs. Services provide stable endpoints:
π·οΈ
ClusterIP
Internal-only virtual IP. Default service type. Other pods reach it by DNS name (my-service.namespace.svc.cluster.local).
π‘
NodePort
Exposes service on each node's IP at a static port. For dev/testing or specific use cases.
βοΈ
LoadBalancer
Provisions a cloud LB (ELB on AWS, etc.). Standard for public services in cloud environments.
π
Ingress
HTTP/HTTPS routing. Path-based, host-based. nginx-ingress, Traefik, AWS ALB Controller. The right way to expose multiple services through one LB.
Configuration and Secrets
ConfigMap
Non-sensitive config
Key-value pairs. Mounted as files or environment variables. Application config, feature flags, public URLs.
Secret
Sensitive data
Same shape as ConfigMap, but base64-encoded and (optionally) encrypted at rest. Database passwords, API keys, certs.
For production, integrate with a real secret manager (AWS Secrets Manager, Vault) via External Secrets Operator. Native Secrets are not enough on their own.
Storage
π
Volume
Storage attached to a Pod. Many types: emptyDir (ephemeral), configMap, secret, hostPath, NFS, cloud volumes.
ποΈ
PersistentVolume / PersistentVolumeClaim
Decoupled storage. Cluster admin provisions PVs; users claim them via PVCs. Durable across pod restarts.
βοΈ
StorageClass
Defines how PVs are dynamically provisioned. EBS gp3, NFS, etc. Each cluster typically has multiple StorageClasses for different needs.
π
CSI (Container Storage Interface)
Standard interface for storage drivers. Cloud providers, vendors implement CSI; Kubernetes stays storage-agnostic.
Operators and CRDs
Custom Resource Definitions (CRDs) let you extend the Kubernetes API. Operators package up CRDs with controllers β exposing higher-level abstractions specific to your domain.
# A custom resource managed by an operatorapiVersion: postgres.example.com/v1kind: PostgresClustermetadata: name: my-dbspec: replicas: 3 storage: 100Gi version: "15"
The operator watches PostgresCluster resources and reconciles them β provisioning Postgres clusters, handling failover, managing backups. The user thinks at the level of βI want a Postgres clusterβ rather than βI want StatefulSets, PVCs, ConfigMapsβ¦β. Operators are how complex stateful systems get well-managed in K8s.
Multi-Tenancy in Kubernetes
For multi-tenant SaaS, K8s supports multiple isolation strategies:
π’
Namespace per tenant
Logical separation, RBAC scoping, ResourceQuotas. Cheapest. Suits cooperative multi-tenancy where you trust your tenants.
π°
Cluster per tenant
Strongest isolation. Higher operational cost. Required for hard compliance (HIPAA per-tenant, certain regulated industries).
π
Virtual clusters
vcluster, Capsule. Each tenant gets a virtualized control plane on a shared host cluster. Middle ground.
π―
Hybrid
Big enterprise tenants on dedicated clusters; everyone else on shared with namespace isolation. Common in B2B SaaS.
What Makes Kubernetes Hard
The flip side: when you do master it, you have a portable, declarative, self-healing, automation-friendly platform that the entire cloud-native ecosystem builds on.
Recap
Kubernetes splits into control plane (etcd, API server, scheduler, controllers) and data plane (kubelet, kube-proxy, runtime).
The reconciliation loop pattern: declare desired state, controllers continuously move actual state toward it.