Deployments and ReplicaSets
Use Deployments to manage replicas and rolling updates.
A Deployment is the controller you will use most often. It creates and manages ReplicaSets to keep replica counts and version updates under control. Deployments give you declarative updates, rollback history, and predictable scaling for stateless workloads.
This quick start expands on the basic flow with rollout strategy, probes, scaling, and common troubleshooting patterns.
Key points
- ReplicaSet owns replica counts.
- Deployment owns version updates and rollbacks.
- Rolling updates allow zero-downtime deployments when configured correctly.
Example Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: api
image: nginx:1.25
Label and selector rules
The Deployment selector must match the Pod template labels. If they drift, the controller will not manage its Pods correctly. Avoid editing selectors after creation.
Declarative updates
Deployments are designed for declarative workflow: you describe the desired state and let Kubernetes converge. Avoid manual Pod deletion as a deployment mechanism; instead, update the Deployment spec and let it roll out.
This also means you should avoid editing ReplicaSets directly; changes should flow from the Deployment.
Rollout and rollback
kubectl rollout status deploy/api
kubectl rollout history deploy/api
kubectl rollout undo deploy/api
You can also pause a rollout, inspect state, and then resume:
kubectl rollout pause deploy/api
kubectl rollout resume deploy/api
Rollout history details
You can inspect specific revisions:
kubectl rollout history deploy/api --revision=2
This helps identify which change introduced an issue.
Triggering updates
You can update images imperatively, which is useful for quick tests:
kubectl set image deploy/api api=nginx:1.26
kubectl rollout status deploy/api
For production, prefer editing manifests and applying them so changes are tracked.
Deployment annotations
Add annotations for ownership, team, or change tracking. This helps with auditability and incident response.
Scaling
Scale the Deployment up or down:
kubectl scale deploy/api --replicas=5
kubectl get deploy api
Autoscaling is handled by HPA, but you should set CPU requests first so scaling is meaningful.
Horizontal Pod Autoscaler (HPA) basics
Once requests are set, you can enable HPA:
kubectl autoscale deploy/api --min=2 --max=6 --cpu-percent=60
Ensure metrics-server is installed, otherwise HPA cannot read metrics.
Readiness and liveness
Health probes keep bad pods out of service and help recovery:
livenessProbe:
httpGet:
path: /healthz
port: 80
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 80
initialDelaySeconds: 5
periodSeconds: 5
PreStop and graceful shutdown
If your service needs time to drain requests, add a preStop hook and increase terminationGracePeriodSeconds:
terminationGracePeriodSeconds: 30
containers:
- name: api
lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 10"]
This reduces 502s during rollouts.
Rolling update tuning
Use maxSurge and maxUnavailable to balance speed and stability. For critical APIs, set maxUnavailable: 0 so at least the existing replicas remain available during updates.
minReadySeconds and progress deadlines
You can enforce a minimum ready time and a rollout deadline:
minReadySeconds: 10
progressDeadlineSeconds: 600
minReadySeconds ensures pods stay ready for a period before being considered stable. progressDeadlineSeconds fails a rollout if it does not make progress in time.
Resource requests and limits
Set requests so the scheduler can place Pods correctly:
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "1"
memory: "512Mi"
Without requests, HPA cannot make good decisions and nodes can become overloaded.
PodDisruptionBudget (PDB)
For critical services, add a PDB so voluntary disruptions do not take down all replicas:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: api
This works alongside rolling updates to keep enough pods serving traffic.
Deployment strategy options
RollingUpdate(default)Recreate(stop all old pods, then start new ones)
Use Recreate only when you cannot run old and new versions side by side.
Inspect ReplicaSets
Deployments create ReplicaSets on each update. You can inspect them to understand rollout history:
kubectl get rs
kubectl describe rs <replicaset>
Old ReplicaSets remain unless you set revisionHistoryLimit.
You can safely delete old ReplicaSets once you are sure you will not roll back to them.
Pod template changes
Any change to the Pod template triggers a new ReplicaSet. Changes to labels outside spec.template do not. Keep this in mind when you want to force a rollout.
Revision history control
Limit how many old ReplicaSets are kept:
revisionHistoryLimit: 5
This keeps rollback history without leaving too many old objects.
Image tags and rollouts
Avoid using latest in production. Use versioned tags so rollouts are deterministic. If you must reuse a tag, set imagePullPolicy: Always or force a rollout to pick up changes.
Treat image tag changes as the trigger for new ReplicaSets. This keeps deployments traceable.
Canary pattern
For sensitive services, use canary updates by creating a second Deployment with a small replica count and routing a portion of traffic to it. This is safer than updating all replicas at once.
Blue/green alternative
Another safe pattern is blue/green: deploy a new version alongside the old one, run validation, then switch traffic by updating a Service selector. This makes rollback as simple as pointing the Service back to the old Deployment.
Common issues
- Pods not updating: check image tag and pull policy.
- Rollout stuck: readiness probes failing or
maxUnavailabletoo strict. - Unexpected restarts: liveness probe too aggressive.
- Image pull errors: check registry credentials and image tags.
Service compatibility
Make sure the Service selector matches the Deployment labels. If they differ, your Pods will be healthy but receive no traffic. This is a common reason for 503s right after a rollout.
Debug workflow
kubectl describe deploy api
kubectl describe pod <pod-name>
kubectl logs <pod-name>
Pod template hashes
Each ReplicaSet is labeled with a pod-template-hash. This is how the Deployment distinguishes old and new ReplicaSets during rollouts. You can use it to trace which Pods belong to which revision.
This is also useful when debugging mixed-version traffic during a rollout.
Status conditions
Check Deployment conditions to see why a rollout is blocked:
kubectl get deploy api -o jsonpath='{.status.conditions}'
Use kubectl get events to see recent failures, especially image pull or probe errors.
Cleanup and scaling down
Scale down during low traffic to save resources:
kubectl scale deploy/api --replicas=1
For stateless services, scaling to zero is acceptable if you have an external wake-up or cron job to bring it back.
Promotion checklist
Before a major rollout, verify that:
- Probes are correct.
- Resource requests fit node capacity.
- Rollback is tested.
- Logs and metrics are visible.
- The rollback path is documented.
Debug workflow
kubectl describe deploy api
kubectl describe pod <pod-name>
kubectl logs <pod-name>
Practical notes
- Start with a quick inventory:
kubectl get nodes,kubectl get pods -A, andkubectl get events -A. - Compare desired vs. observed state;
kubectl describeusually explains drift or failed controllers. - Keep names, labels, and selectors consistent so Services and controllers can find Pods.
- Keep deployment manifests in version control to track changes over time.
Quick checklist
- The resource matches the intent you described in YAML.
- Namespaces, RBAC, and images are correct for the target environment.
- Health checks and logs are in place before promotion.
- Rollout strategy is appropriate for traffic patterns.
Field checklist
When you move from a quick lab to real traffic, confirm the basics every time. Check resource requests, readiness behavior, log coverage, alerting, and clear rollback steps. A checklist prevents skipping the boring steps that keep services stable. Keep it short, repeatable, and stored with the repo so it evolves with the service and stays close to the code.
Troubleshooting flow
Start from symptoms, not guesses. Review recent events for scheduling, image, or probe failures, then scan logs for application errors. If traffic is failing, confirm readiness, verify endpoints, and trace the request path hop by hop. When data looks wrong, validate the active version and configuration against the release plan. Always record what you changed so a rollback is fast and a postmortem is accurate.
Small exercises to build confidence
Practice common operations in a safe environment. Scale the workload up and down and observe how quickly it stabilizes. Restart a single Pod and watch how the service routes around it. Change one configuration value and verify that the change is visible in logs or metrics. These small drills teach how the system behaves under real operations without waiting for an outage.
Production guardrails
Introduce limits gradually. Resource quotas, PodDisruptionBudgets, and network policies should be tested in staging before production. Keep backups and restore procedures documented, even for stateless services, because dependencies often are not stateless. Align monitoring with user outcomes so you catch regressions before they become incidents.
Documentation and ownership
Write down who owns the service, what success looks like, and which dashboards to use. Include the on-call rotation, escalation path, and basic runbooks for common failures. A small amount of documentation removes a lot of guesswork during incidents and helps new team members ramp up quickly.
Quick validation
After any change, validate the system the same way a user would. Hit the main endpoint, check latency, and watch for error spikes. Confirm that new pods are ready, old ones are gone, and metrics are stable. If the change touched storage, verify disk usage and cleanup behavior. If it touched networking, confirm DNS names and endpoint lists are correct.
Release notes
Write a short note with what changed, why it changed, and how to roll back. This is not bureaucracy; it prevents confusion during incidents. Even a few bullets help future you remember intent and context.
Capacity check
Compare current usage to requests and limits. If the service is close to limits, plan a small scaling adjustment before traffic grows. Capacity planning is easier when it is incremental rather than reactive.
Final reminder
Keep changes small and observable. If a release is risky, reduce scope and validate in staging first. Prefer frequent small updates over rare large ones. When in doubt, pick the option that simplifies rollback and reduces time to detect issues. The goal is not perfect config, but predictable operations.