CFN Cloud
Cloud Future New Life
en zh
2025-10-15 · 60 views

Probes

Tell Kubernetes when an app is ready or needs a restart.

Probes decide whether a Pod should receive traffic or be restarted.

Three types

  • readiness: can it receive traffic
  • liveness: should it restart
  • startup: protect slow startup

Example

readinessProbe:
  httpGet:
    path: /readyz
    port: 8080
  periodSeconds: 5
livenessProbe:
  httpGet:
    path: /livez
    port: 8080
  periodSeconds: 10

Common pitfalls

  • Using liveness as a traffic switch
  • Timeouts that are too short

Practical notes

  • Start with a quick inventory: kubectl get nodes, kubectl get pods -A, and kubectl get events -A.
  • Compare desired vs. observed state; kubectl describe usually explains drift or failed controllers.
  • Keep names, labels, and selectors consistent so Services and controllers can find Pods.

Quick checklist

  • The resource matches the intent you described in YAML.
  • Namespaces, RBAC, and images are correct for the target environment.
  • Health checks and logs are in place before promotion.

Probes in a workload-oriented view

Whether you are defining a Pod, tuning probes, or organizing namespaces, the goal is to make workloads predictable. probes is part of how Kubernetes manages lifecycle, scheduling, and isolation. Think of it as a tool for turning application intent into a repeatable unit of operation.

Labels, selectors, and ownership

Every workload should be discoverable. Labels are the primary index, and selectors are how controllers and Services find what they manage. Use consistent keys like app, component, and env. Ownership links, such as controller references, determine which objects are recreated when something is deleted. Without a consistent label strategy, even simple troubleshooting becomes slow.

Scheduling and resource requests

Schedulers rely on requests to place Pods. If requests are missing, the cluster cannot make fair decisions, and overload becomes likely. For small services, start with conservative requests and measure. For batch jobs, set limits to protect critical workloads. Even in a namespace level discussion, quotas and limit ranges are how you enforce these rules.

Startup, readiness, and termination

Probes should reflect true readiness, not just process liveness. A container can be running and still not ready to serve traffic. Use readiness probes to gate traffic, and make liveness probes forgiving to avoid restart loops. Shutdown matters too: define terminationGracePeriodSeconds and handle SIGTERM so the app can flush work and release locks.

Isolation and security basics

Namespaces separate teams and environments, but they are not a hard boundary. Combine them with RBAC, NetworkPolicy, and Pod security settings. SecurityContext settings like runAsNonRoot, readOnlyRootFilesystem, and drop capabilities are small changes that reduce risk. If a workload needs extra permissions, document why.

Resource isolation and noisy neighbors

CPU limits can cause throttling, and memory limits can trigger OOM kills. When a pod is latency sensitive, prefer realistic requests and avoid overly tight limits. For batch workloads, use limits to prevent them from crowding out interactive services. This balance is part of everyday cluster operations.

Config and secret lifecycle

ConfigMaps and Secrets should be treated as part of the workload contract. Decide whether configuration changes should trigger a rollout or be hot reloaded. Keep sensitive data in Secrets and limit access with RBAC. Document how config changes are promoted between environments.

Debugging workflow

A steady workflow saves time. Start with describe for events, then logs, then exec into a container if needed. For probes, check the endpoint directly from inside the Pod to confirm it works. For namespace or quota issues, inspect ResourceQuota and LimitRange objects to see why a Pod was rejected.

kubectl get pods -n demo
kubectl describe pod demo-app -n demo
kubectl logs demo-app -n demo --tail=200
kubectl exec -it demo-app -n demo -- sh

Observability signals

Events explain scheduling and startup failures. Logs tell you application behavior. Metrics show trends like CPU spikes or memory growth. Combine these three signals before guessing. A short habit of checking all three saves long debugging cycles.

Practical stability checklist

Make sure each workload has labels, requests, and probes. Ensure Services can find Pods via selectors. Verify that namespaces have the right RBAC bindings and quotas. Finally, confirm that termination and startup behavior matches your real traffic patterns.

Wrap-up: probes should be boring

If probes are “clever”, they usually end up being fragile.

  • Keep readiness conservative (it’s your traffic gate).
  • Use liveness only when restart truly helps.
  • For slow startups, prefer startupProbe over a huge initialDelaySeconds.

References