MySQL Replication
Use replication for read scaling and recovery.
MySQL replication writes to a primary and replays changes on replicas so reads can be offloaded.
Key ideas
- Primary writes and generates binlog
- Replicas pull and replay the binlog
- Stable network identity matters
In Kubernetes
- StatefulSet keeps instance order
- Headless Service provides stable DNS
- Separate read/write Services if needed
Risks
- Replication lag can break consistency
- Primary failure needs a failover plan
Practical notes
- Start with a quick inventory:
kubectl get nodes,kubectl get pods -A, andkubectl get events -A. - Compare desired vs. observed state;
kubectl describeusually explains drift or failed controllers. - Keep names, labels, and selectors consistent so Services and controllers can find Pods.
Quick checklist
- The resource matches the intent you described in YAML.
- Namespaces, RBAC, and images are correct for the target environment.
- Health checks and logs are in place before promotion.
Data, identity, and steady state
Stateful workloads need stable identity and stable storage. MySQL replication is where Kubernetes provides that stability through persistent volumes, stable DNS names, and ordered lifecycle management. The goal is to keep data safe while still allowing automated rollouts and rescheduling.
Replication topology and consistency
For replicated systems, choose a topology that matches your consistency needs. Single leader with followers is common for relational databases, while quorum based systems require a majority to make progress. Understand how your database elects a leader and how clients discover it. Kubernetes can schedule Pods, but it does not solve consensus for you.
Storage planning and isolation
Each replica should have its own PVC. Shared volumes can cause corruption unless the application is built for it. Plan storage capacity per replica and budget for growth. Use anti affinity to spread replicas across nodes so a single failure does not drop the entire cluster.
Backup and restore discipline
Persistent volumes are not backups. Use logical dumps or snapshots and test restores regularly. Document the recovery sequence, especially for systems with replication, because the order of restore can determine which node becomes the primary. Disaster recovery is a process, not a file.
Upgrades and failure handling
Stateful upgrades are slower and require more care. Use partitions or staged rollouts, and ensure readiness probes reflect real availability. When a node fails, pods may reschedule, but volumes may take time to attach. Monitor for stuck attachments and design for longer recovery windows.
Observability and tuning
Track replication lag, storage latency, and disk usage. These are early warning signals. Resource limits that are too tight can cause throttling and timeouts, so set realistic requests and leave headroom. For databases, IO latency is often a better signal than CPU usage.
Leader routing and client behavior
Clients often need to send writes to a leader and reads to replicas. Use stable DNS names for direct access and Services for balanced reads. If your system supports read only replicas, make that separation explicit in client config.
Maintenance and automation
Schedule compaction, vacuum, or defragmentation during low traffic windows. Operators or automation tools can enforce backup schedules and safe rollouts, reducing human error. Treat stateful maintenance as a regular task, not an emergency.
kubectl get pods -n demo
kubectl get pvc -n demo
kubectl describe pod db-0 -n demo
Operational checklist
Verify anti affinity, PDBs, and backup jobs. Confirm that each replica has its own volume, and that failover procedures are rehearsed. Stateful reliability comes from consistent operational habits as much as from configuration.
Wrap-up: replication is easy, failover is not
The scary part isn’t setting up replicas. It’s knowing what happens when the primary dies.
Do at least one controlled failover test in staging and measure:
- time to promote
- time to rejoin
- how clients behave during the gap