Kubernetes Probes: Liveness, Readiness, and Startup Protection

Learn how liveness, readiness, and startup probes work in Kubernetes, what each one should check, and how to avoid restart loops and false failures.

Probes are the difference between “the process exists” and “the workload is actually safe to serve traffic”. They control whether a Pod receives traffic, whether it gets restarted, and whether slow startup is treated as a failure.

The three probe types

readiness: should this Pod receive traffic now?
liveness: should this container be restarted?
startup: should Kubernetes wait longer before applying the other checks?

Readiness is a traffic gate. Liveness is a restart decision. Startup is protection for slow boot paths.

Smallest useful config

readinessProbe:
  httpGet:
    path: /readyz
    port: 8080
  periodSeconds: 5
livenessProbe:
  httpGet:
    path: /livez
    port: 8080
  periodSeconds: 10
startupProbe:
  httpGet:
    path: /startupz
    port: 8080
  periodSeconds: 5
  failureThreshold: 24

What each probe should really test

Readiness

Test whether the workload can safely handle traffic right now.

Good readiness signals:

app listener is ready
required dependencies are reachable enough for serving
migrations or warm-up work is complete

Bad readiness signals:

deep health checks that fail on every minor dependency wobble
checks that are so expensive they become a source of load themselves

Liveness

Use liveness only when restart is the right recovery action.

Good liveness signals:

deadlock detection
event loop stuck
process is alive but functionally wedged

Bad liveness signals:

temporary downstream outage
startup delays
short CPU spikes or GC pauses

If restarting does not help, liveness is often the wrong tool.

Startup

Use startup probe when boot is slow or noisy. It prevents liveness and readiness from firing too early.

This is usually better than setting a huge initialDelaySeconds and hoping it fits every environment.

Common misconfigurations

Using liveness as a traffic switch

This is one of the most expensive mistakes. If the goal is to stop traffic temporarily, readiness should do that. Liveness may turn a transient problem into a restart storm.

Making readiness too strict

If readiness depends on every non-critical downstream dependency being perfect, you may keep healthy Pods out of endpoints for the wrong reason.

Tiny timeouts and failure thresholds

Short timeouts can work in quiet tests and fail badly under real load. Probe tuning should reflect actual latency and startup behavior, not ideal conditions.

Probe tuning strategy

Start from workload behavior, not from examples copied off the internet.

periodSeconds: how often to check
timeoutSeconds: how long to wait
failureThreshold: how many failures before acting
successThreshold: how many successes before readiness recovers

For busy services, more forgiving thresholds often produce a more stable system than aggressive restarts.

Readiness and rollouts

Readiness has a direct effect on rollouts. A Pod that never becomes ready can block Deployment progress, keep old ReplicaSets alive, or leave Services with too few healthy endpoints.

That is why many rollout failures are actually probe design failures.

Probes and graceful shutdown

Probe behavior does not live in isolation. It has to fit your shutdown path too.

use readiness to stop new traffic first
allow enough terminationGracePeriodSeconds
handle SIGTERM properly
avoid killing a Pod while it is still draining requests

Without graceful shutdown, a good readiness probe still may not save you from connection errors during rollout.

A concrete debugging workflow

kubectl describe pod demo-app -n demo
kubectl logs demo-app -n demo --tail=200
kubectl exec -it demo-app -n demo -- sh

When debugging probes, always ask:

What is the probe actually checking?
Can I reach that endpoint or command inside the container?
Is the failure caused by startup timing, app logic, or dependency health?

Probe decision table

Situation	Better choice
App is booting slowly	`startupProbe`
App should stop receiving traffic temporarily	`readinessProbe`
Process is wedged and restart helps	`livenessProbe`
Downstream dependency is flaky	usually readiness, not liveness

What not to probe directly

Avoid probe logic that is too broad.

Examples:

full end-to-end business transactions
very expensive DB queries
checks that depend on unrelated external systems

Probes should be informative, not dramatic.

Questions worth asking

Q: Should readiness and liveness use the same endpoint? A: Usually no. Readiness and liveness answer different questions, so reusing one endpoint often mixes traffic routing with restart logic.

Q: When should I add a startup probe? A: Add one when the service can take long enough to initialize that readiness or liveness would fire too early.

Q: Why is my Pod running but receiving no traffic? A: A common cause is failing readiness. The container is alive, but Kubernetes is intentionally keeping it out of endpoints.

Before you use this in a real cluster

Good probes should feel boring. If a probe is too clever, too deep, or too aggressive, it usually ends up being part of the outage instead of part of the protection.