Kubernetes Probes Explained: Liveness, Readiness, and Startup Checks
Learn how liveness, readiness, and startup probes work in Kubernetes, what each one should check, and how to avoid restart loops and false failures.
Probes are the difference between “the process exists” and “the workload is actually safe to serve traffic”. They control whether a Pod receives traffic, whether it gets restarted, and whether slow startup is treated as a failure.
The three probe types
- readiness: should this Pod receive traffic now?
- liveness: should this container be restarted?
- startup: should Kubernetes wait longer before applying the other checks?
Readiness is a traffic gate. Liveness is a restart decision. Startup is protection for slow boot paths.
Minimal example
readinessProbe:
httpGet:
path: /readyz
port: 8080
periodSeconds: 5
livenessProbe:
httpGet:
path: /livez
port: 8080
periodSeconds: 10
startupProbe:
httpGet:
path: /startupz
port: 8080
periodSeconds: 5
failureThreshold: 24
What each probe should really test
Readiness
Test whether the workload can safely handle traffic right now.
Good readiness signals:
- app listener is ready
- required dependencies are reachable enough for serving
- migrations or warm-up work is complete
Bad readiness signals:
- deep health checks that fail on every minor dependency wobble
- checks that are so expensive they become a source of load themselves
Liveness
Use liveness only when restart is the right recovery action.
Good liveness signals:
- deadlock detection
- event loop stuck
- process is alive but functionally wedged
Bad liveness signals:
- temporary downstream outage
- startup delays
- short CPU spikes or GC pauses
If restarting does not help, liveness is often the wrong tool.
Startup
Use startup probe when boot is slow or noisy. It prevents liveness and readiness from firing too early.
This is usually better than setting a huge initialDelaySeconds and hoping it fits every environment.
Common misconfigurations
Using liveness as a traffic switch
This is one of the most expensive mistakes. If the goal is to stop traffic temporarily, readiness should do that. Liveness may turn a transient problem into a restart storm.
Making readiness too strict
If readiness depends on every non-critical downstream dependency being perfect, you may keep healthy Pods out of endpoints for the wrong reason.
Tiny timeouts and failure thresholds
Short timeouts can work in quiet tests and fail badly under real load. Probe tuning should reflect actual latency and startup behavior, not ideal conditions.
Probe tuning strategy
Start from workload behavior, not from examples copied off the internet.
periodSeconds: how often to checktimeoutSeconds: how long to waitfailureThreshold: how many failures before actingsuccessThreshold: how many successes before readiness recovers
For busy services, more forgiving thresholds often produce a more stable system than aggressive restarts.
Readiness and rollouts
Readiness has a direct effect on rollouts. A Pod that never becomes ready can block Deployment progress, keep old ReplicaSets alive, or leave Services with too few healthy endpoints.
That is why many rollout failures are actually probe design failures.
Probes and graceful shutdown
Probe behavior does not live in isolation. It has to fit your shutdown path too.
- use readiness to stop new traffic first
- allow enough
terminationGracePeriodSeconds - handle SIGTERM properly
- avoid killing a Pod while it is still draining requests
Without graceful shutdown, a good readiness probe still may not save you from connection errors during rollout.
A practical debugging workflow
kubectl describe pod demo-app -n demo
kubectl logs demo-app -n demo --tail=200
kubectl exec -it demo-app -n demo -- sh
When debugging probes, always ask:
- What is the probe actually checking?
- Can I reach that endpoint or command inside the container?
- Is the failure caused by startup timing, app logic, or dependency health?
Probe decision table
| Situation | Better choice |
|---|---|
| App is booting slowly | startupProbe |
| App should stop receiving traffic temporarily | readinessProbe |
| Process is wedged and restart helps | livenessProbe |
| Downstream dependency is flaky | usually readiness, not liveness |
What not to probe directly
Avoid probe logic that is too broad.
Examples:
- full end-to-end business transactions
- very expensive DB queries
- checks that depend on unrelated external systems
Probes should be informative, not dramatic.
FAQ
Q: Should readiness and liveness use the same endpoint? A: Usually no. Readiness and liveness answer different questions, so reusing one endpoint often mixes traffic routing with restart logic.
Q: When should I add a startup probe? A: Add one when the service can take long enough to initialize that readiness or liveness would fire too early.
Q: Why is my Pod running but receiving no traffic? A: A common cause is failing readiness. The container is alive, but Kubernetes is intentionally keeping it out of endpoints.
Next reading
- Continue with
kubernetes-quickstart-deployment-replicaset.mdfor rollout behavior. - Read
kubernetes-quickstart-service.mdto understand how readiness affects traffic. - For production guidance, continue into the probes tips article.
Wrap-up
Good probes should feel boring. If a probe is too clever, too deep, or too aggressive, it usually ends up being part of the outage instead of part of the protection.