CFN Cloud
Cloud Future New Life
en zh
2025-10-10 · 0 views

StatefulSet Basics

Provide stable identity and storage for stateful workloads.

StatefulSet is built for databases and queues. The key idea is stable identity plus stable storage. Each Pod gets a predictable name and its own persistent volume claim (PVC), which makes it suitable for workloads that expect durable data and stable network identities.

This quick start expands on the core concepts with a minimal example, headless Service usage, and operational tips.

When to use StatefulSet

  • Databases, message queues, and stateful caches.
  • Workloads that rely on stable hostnames or ordered startup.
  • Services that need one PVC per replica.

Typical features

  • Fixed Pod names (like mysql-0).
  • One PVC per Pod.
  • Ordered startup, scaling, and rolling updates.
  • Stable DNS names via a headless Service.

Headless Service

StatefulSets require a headless Service for stable DNS:

apiVersion: v1
kind: Service
metadata:
  name: mysql
spec:
  clusterIP: None
  selector:
    app: mysql
  ports:
  - port: 3306
    targetPort: 3306

This enables DNS names like mysql-0.mysql.default.svc.cluster.local.

Core snippet

spec:
  serviceName: mysql
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 10Gi

Scaling behavior

StatefulSets scale in order. When scaling up, Pods start from 0 upward. When scaling down, the highest ordinal is removed first. This is safer for databases but slower than Deployments.

Ordered startup and readiness

StatefulSet waits for each Pod to become Ready before moving to the next when using ordered management. If your readiness probe is too strict, the rollout can stall. Make sure probes reflect true readiness without blocking forever.

Rolling updates

By default, updates roll from the highest ordinal down. You can control update strategy with podManagementPolicy and updateStrategy:

updateStrategy:
  type: RollingUpdate

You can also use partition to update only a subset:

updateStrategy:
  type: RollingUpdate
  rollingUpdate:
    partition: 1

This keeps ordinals below the partition on the old version.

Pod management policy

For faster parallel startup, set podManagementPolicy: Parallel, but use this only if your app can tolerate unordered startup.

Storage class and size

Pick a StorageClass that matches your IO needs. For databases, SSD-backed classes are typical. Each replica gets its own PVC, so plan storage capacity accordingly.

Storage access modes

Most stateful apps use ReadWriteOnce because each Pod writes to its own volume. If you need shared storage across replicas, verify your storage backend supports ReadWriteMany.

Shared access modes are uncommon for databases and can introduce contention.

PVC naming and expansion

PVCs are named with the claim template and ordinal (for example, data-mysql-0). If your StorageClass supports expansion, you can grow the PVC size, but check whether filesystem expansion is required.

Anti-affinity and spreading

Use pod anti-affinity to spread replicas across nodes and reduce correlated failures:

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchLabels:
            app: mysql
        topologyKey: kubernetes.io/hostname

This is especially important when you have more than one replica.

Probes and readiness

Stateful workloads need careful probes. Set readiness probes to ensure traffic only reaches healthy replicas, and be conservative with liveness to avoid restart loops during recovery.

Example: MySQL StatefulSet

A minimal StatefulSet usually includes a headless Service, a Secret for credentials, and the StatefulSet itself. Use this pattern instead of a Deployment when you need stable storage per replica.

Backup and restore

PVCs are not backups. Use logical dumps, snapshots, or operator-managed backups and test restores regularly. StatefulSets make identity stable, but they do not protect you from data corruption or accidental deletes.

Network identity

Each Pod can be addressed by a stable DNS name. This helps clustered systems that rely on fixed peer addresses. Example:

mysql-0.mysql.default.svc.cluster.local

Use this for replication config or peer discovery when your app supports it.

Client access patterns

Applications can either connect through a Service for load-balanced reads or connect directly to a specific Pod for leader/follower roles. Choose the pattern that fits your topology.

Common issues

  • Pod Pending: PVC not bound or StorageClass missing.
  • Stuck on init: app requires ordered startup or data bootstrap.
  • Rolling update blocked: readiness probe failing.

Migration from Deployment

If you started with a Deployment, moving to StatefulSet usually requires a new workload and data migration. Plan for a short maintenance window, migrate data to new PVCs, and switch clients to the new headless Service.

Resource requests

Stateful apps can be sensitive to throttling. Set requests/limits to keep them stable:

resources:
  requests:
    cpu: "500m"
    memory: "1Gi"
  limits:
    cpu: "2"
    memory: "2Gi"

Tune these based on workload and node size.

Security considerations

Restrict access to stateful Pods with NetworkPolicy and avoid exposing headless Services outside the cluster. Credentials should live in Secrets, and backups should be encrypted.

Troubleshooting notes

If you see unexpected data loss, verify that Pods are not sharing a single PVC. Each replica should have its own claim. Also check whether a node failure caused a reschedule and whether your app handles clean shutdowns.

Troubleshooting

kubectl get pods
kubectl get pvc
kubectl describe pod <pod-name>
kubectl get events -A

Service discovery patterns

Client applications often connect through a Service, but internal replication or clustering may need direct Pod DNS names. Use the headless Service DNS when configuring replication peers to avoid hardcoding IPs.

Volume reclaim policy

If a StatefulSet is deleted, PVCs usually remain. This protects data but can leave unused volumes behind. Clean them up intentionally when you decommission a workload.

Practical tips

  • Start with a single replica, verify data, then scale out.
  • Backups are still required; PVCs are not backups.
  • Use anti-affinity to spread replicas across nodes.
  • Document recovery steps before scaling past one replica.
  • Monitor PVC growth to avoid running out of disk.

Scaling down safely

When scaling down, StatefulSet deletes the highest ordinal first. Ensure your database can remove replicas cleanly, and avoid scaling down during heavy write traffic.

Update strategy caveats

Rolling updates can cause brief unavailability if readiness probes are strict. Test updates in a staging namespace and watch the order of updates to avoid surprises.

Observability

Stateful workloads benefit from strong observability. Track storage usage, replication lag, and latency. Export metrics when possible and alert on disk pressure before it causes evictions.

Pod disruption budgets

Use a PDB to avoid losing too many replicas during maintenance:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: mysql-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: mysql

Quick checklist

  • The resource matches the intent you described in YAML.
  • Namespaces, RBAC, and images are correct for the target environment.
  • Health checks and logs are in place before promotion.
  • Storage capacity is planned per replica.

Field checklist

When you move from a quick lab to real traffic, confirm the basics every time. Check resource requests, readiness behavior, log coverage, alerting, and clear rollback steps. A checklist prevents skipping the boring steps that keep services stable. Keep it short, repeatable, and stored with the repo so it evolves with the service and stays close to the code.

Troubleshooting flow

Start from symptoms, not guesses. Review recent events for scheduling, image, or probe failures, then scan logs for application errors. If traffic is failing, confirm readiness, verify endpoints, and trace the request path hop by hop. When data looks wrong, validate the active version and configuration against the release plan. Always record what you changed so a rollback is fast and a postmortem is accurate.

Small exercises to build confidence

Practice common operations in a safe environment. Scale the workload up and down and observe how quickly it stabilizes. Restart a single Pod and watch how the service routes around it. Change one configuration value and verify that the change is visible in logs or metrics. These small drills teach how the system behaves under real operations without waiting for an outage.

Production guardrails

Introduce limits gradually. Resource quotas, PodDisruptionBudgets, and network policies should be tested in staging before production. Keep backups and restore procedures documented, even for stateless services, because dependencies often are not stateless. Align monitoring with user outcomes so you catch regressions before they become incidents.

Documentation and ownership

Write down who owns the service, what success looks like, and which dashboards to use. Include the on-call rotation, escalation path, and basic runbooks for common failures. A small amount of documentation removes a lot of guesswork during incidents and helps new team members ramp up quickly.

Quick validation

After any change, validate the system the same way a user would. Hit the main endpoint, check latency, and watch for error spikes. Confirm that new pods are ready, old ones are gone, and metrics are stable. If the change touched storage, verify disk usage and cleanup behavior. If it touched networking, confirm DNS names and endpoint lists are correct.

Release notes

Write a short note with what changed, why it changed, and how to roll back. This is not bureaucracy; it prevents confusion during incidents. Even a few bullets help future you remember intent and context.

Capacity check

Compare current usage to requests and limits. If the service is close to limits, plan a small scaling adjustment before traffic grows. Capacity planning is easier when it is incremental rather than reactive.

Final reminder

Keep changes small and observable. If a release is risky, reduce scope and validate in staging first. Prefer frequent small updates over rare large ones. When in doubt, pick the option that simplifies rollback and reduces time to detect issues. The goal is not perfect config, but predictable operations.

References