CFN Cloud
2025-10-15

Kubernetes Probes Explained: Liveness, Readiness, and Startup Checks

Learn how liveness, readiness, and startup probes work in Kubernetes, what each one should check, and how to avoid restart loops and false failures.

Probes are the difference between “the process exists” and “the workload is actually safe to serve traffic”. They control whether a Pod receives traffic, whether it gets restarted, and whether slow startup is treated as a failure.

The three probe types

  • readiness: should this Pod receive traffic now?
  • liveness: should this container be restarted?
  • startup: should Kubernetes wait longer before applying the other checks?

Readiness is a traffic gate. Liveness is a restart decision. Startup is protection for slow boot paths.

Minimal example

readinessProbe:
  httpGet:
    path: /readyz
    port: 8080
  periodSeconds: 5
livenessProbe:
  httpGet:
    path: /livez
    port: 8080
  periodSeconds: 10
startupProbe:
  httpGet:
    path: /startupz
    port: 8080
  periodSeconds: 5
  failureThreshold: 24

What each probe should really test

Readiness

Test whether the workload can safely handle traffic right now.

Good readiness signals:

  • app listener is ready
  • required dependencies are reachable enough for serving
  • migrations or warm-up work is complete

Bad readiness signals:

  • deep health checks that fail on every minor dependency wobble
  • checks that are so expensive they become a source of load themselves

Liveness

Use liveness only when restart is the right recovery action.

Good liveness signals:

  • deadlock detection
  • event loop stuck
  • process is alive but functionally wedged

Bad liveness signals:

  • temporary downstream outage
  • startup delays
  • short CPU spikes or GC pauses

If restarting does not help, liveness is often the wrong tool.

Startup

Use startup probe when boot is slow or noisy. It prevents liveness and readiness from firing too early.

This is usually better than setting a huge initialDelaySeconds and hoping it fits every environment.

Common misconfigurations

Using liveness as a traffic switch

This is one of the most expensive mistakes. If the goal is to stop traffic temporarily, readiness should do that. Liveness may turn a transient problem into a restart storm.

Making readiness too strict

If readiness depends on every non-critical downstream dependency being perfect, you may keep healthy Pods out of endpoints for the wrong reason.

Tiny timeouts and failure thresholds

Short timeouts can work in quiet tests and fail badly under real load. Probe tuning should reflect actual latency and startup behavior, not ideal conditions.

Probe tuning strategy

Start from workload behavior, not from examples copied off the internet.

  • periodSeconds: how often to check
  • timeoutSeconds: how long to wait
  • failureThreshold: how many failures before acting
  • successThreshold: how many successes before readiness recovers

For busy services, more forgiving thresholds often produce a more stable system than aggressive restarts.

Readiness and rollouts

Readiness has a direct effect on rollouts. A Pod that never becomes ready can block Deployment progress, keep old ReplicaSets alive, or leave Services with too few healthy endpoints.

That is why many rollout failures are actually probe design failures.

Probes and graceful shutdown

Probe behavior does not live in isolation. It has to fit your shutdown path too.

  • use readiness to stop new traffic first
  • allow enough terminationGracePeriodSeconds
  • handle SIGTERM properly
  • avoid killing a Pod while it is still draining requests

Without graceful shutdown, a good readiness probe still may not save you from connection errors during rollout.

A practical debugging workflow

kubectl describe pod demo-app -n demo
kubectl logs demo-app -n demo --tail=200
kubectl exec -it demo-app -n demo -- sh

When debugging probes, always ask:

  1. What is the probe actually checking?
  2. Can I reach that endpoint or command inside the container?
  3. Is the failure caused by startup timing, app logic, or dependency health?

Probe decision table

Situation Better choice
App is booting slowly startupProbe
App should stop receiving traffic temporarily readinessProbe
Process is wedged and restart helps livenessProbe
Downstream dependency is flaky usually readiness, not liveness

What not to probe directly

Avoid probe logic that is too broad.

Examples:

  • full end-to-end business transactions
  • very expensive DB queries
  • checks that depend on unrelated external systems

Probes should be informative, not dramatic.

FAQ

Q: Should readiness and liveness use the same endpoint? A: Usually no. Readiness and liveness answer different questions, so reusing one endpoint often mixes traffic routing with restart logic.

Q: When should I add a startup probe? A: Add one when the service can take long enough to initialize that readiness or liveness would fire too early.

Q: Why is my Pod running but receiving no traffic? A: A common cause is failing readiness. The container is alive, but Kubernetes is intentionally keeping it out of endpoints.

Next reading

  • Continue with kubernetes-quickstart-deployment-replicaset.md for rollout behavior.
  • Read kubernetes-quickstart-service.md to understand how readiness affects traffic.
  • For production guidance, continue into the probes tips article.

Wrap-up

Good probes should feel boring. If a probe is too clever, too deep, or too aggressive, it usually ends up being part of the outage instead of part of the protection.

References