Ephemeral Volumes
Ephemeral volumes live with the Pod and fit cache or temp files.
Ephemeral volumes are deleted with the Pod. They are ideal for cache and temporary files that do not need to survive restarts. This makes them lightweight, fast, and safe for transient data.
This quick start expands on use cases, volume types, sizing, and the operational caveats you need to know before relying on ephemeral storage.
What counts as an ephemeral volume
The most common ephemeral volume is emptyDir. Kubernetes also supports generic ephemeral volumes that can use StorageClasses for short-lived claims.
Typical use cases
- Runtime cache and compiled assets
- Intermediate artifacts during data processing
- Shared files between the main container and a sidecar
- Scratch space for temp files
emptyDir basics
emptyDir is created when the Pod is scheduled and deleted when the Pod is removed:
volumes:
- name: scratch
emptyDir:
sizeLimit: 1Gi
You can mount it into multiple containers to share data:
volumeMounts:
- name: scratch
mountPath: /tmp
Memory-backed emptyDir
For high-speed temp data, you can use memory as the medium:
emptyDir:
medium: Memory
sizeLimit: 512Mi
This uses node memory (tmpfs). It is fast but can contribute to OOM if you allocate too much.
Generic ephemeral volume
You can request ephemeral storage from a StorageClass for workloads that need slightly more structure:
volumes:
- name: cache
ephemeral:
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast-ssd
resources:
requests:
storage: 5Gi
The PVC is created and deleted with the Pod, which is useful for batch jobs or CI workloads.
Sidecar pattern example
Use a sidecar to ship logs or process files from a shared volume:
containers:
- name: app
image: my-app:latest
volumeMounts:
- name: shared
mountPath: /var/log/app
- name: shipper
image: busybox
command: ["sh", "-c", "tail -F /var/log/app/app.log"]
volumeMounts:
- name: shared
mountPath: /var/log/app
Scheduling and disk pressure
Pods with heavy ephemeral usage can be evicted when nodes experience disk pressure. Kubernetes tracks ephemeral storage usage and may evict Pods that exceed limits. Keep temp data small and use limits where possible.
To view disk pressure and eviction signals:
kubectl describe node <node>
Resource requests and limits
You can set ephemeral-storage requests and limits in container resources:
resources:
requests:
cpu: "100m"
memory: "256Mi"
ephemeral-storage: "1Gi"
limits:
cpu: "500m"
memory: "512Mi"
ephemeral-storage: "2Gi"
This helps the scheduler place Pods on nodes with enough local storage.
Lifecycle and cleanup
Ephemeral volumes are tied to Pod lifecycle. If a Pod is rescheduled to another node, its emptyDir contents are lost. This is fine for caches, but it can break workflows that assume persistence. If you need to keep data across restarts, use PVCs.
Plan for data loss by making caches rebuildable and idempotent.
Ephemeral storage and logs
Container logs are stored on the node and count toward ephemeral storage usage. A chatty container can trigger eviction even if its emptyDir usage is small. Set log rotation and avoid excessive debug logging in production.
Best practices
- Set size limits on
emptyDirwhenever possible. - Keep temp data small and clean it periodically.
- Separate cache from critical data to avoid accidental loss.
- Use memory-backed
emptyDironly for small, latency-sensitive files. - Prefer predictable paths so cleanup scripts are reliable.
Size limits and behavior
emptyDir.sizeLimit is a soft limit enforced by the kubelet. If your Pod exceeds it, you may see eviction events. Always budget ephemeral storage alongside CPU and memory so the scheduler can place Pods correctly.
Pods without explicit ephemeral-storage limits can consume more node disk than expected. Setting requests and limits improves predictability and avoids surprise evictions.
Example: init container for cache warm-up
Use an init container to pre-populate a cache into an ephemeral volume:
initContainers:
- name: warm-cache
image: busybox
command: ["sh", "-c", "echo warm > /cache/seed.txt"]
volumeMounts:
- name: cache
mountPath: /cache
containers:
- name: app
image: my-app:latest
volumeMounts:
- name: cache
mountPath: /cache
Example: build workspace
CI jobs often need a scratch directory to compile code. An emptyDir works well:
volumes:
- name: workspace
emptyDir: {}
volumeMounts:
- name: workspace
mountPath: /workspace
Keep artifacts you care about by uploading them to object storage before the Pod exits.
Eviction thresholds
Kubernetes evicts Pods when node disk pressure exceeds thresholds. Ephemeral volumes and container logs both contribute. If you see frequent evictions, check node disk usage and consider adding larger nodes or spreading workloads.
Generic ephemeral vs PVC
Generic ephemeral volumes are created from a StorageClass and live only for the Pod lifetime. They are useful for batch jobs that need temporary space larger than emptyDir but still do not need persistence. In contrast, PVCs are designed for long-lived data and should be used for databases or stateful services.
Local SSD and performance
On cloud platforms, nodes often have local SSD or ephemeral disks. These are fast but not durable. Use them for caches or build artifacts, and always expect data loss during node replacement or upgrade.
Observability
To understand ephemeral usage, inspect node allocatable storage and Pod consumption:
kubectl describe node <node> | rg -n \"ephemeral-storage|Allocatable\"
kubectl top pod -A
Inside a Pod, you can check filesystem usage:
kubectl exec -it <pod-name> -- df -h
If you see low free space, reduce cache sizes or move data to PVCs.
Security considerations
Ephemeral volumes live on the node filesystem. Avoid writing secrets or sensitive data into emptyDir unless you are sure the node is trusted. If you must, use memory-backed storage and keep it small.
If you handle regulated data, consider encrypting data before writing to ephemeral storage or avoiding it entirely. Always sanitize temporary data.
When not to use ephemeral storage
- Databases or any data that must survive Pod restarts.
- Audit logs that must be retained.
- User uploads or business-critical files.
Troubleshooting tips
- Pod Pending: node lacks ephemeral storage capacity.
- Evicted: node disk pressure, reduce temp usage.
- Slow IO: node disk is saturated by other workloads.
If evictions keep happening, reduce cache size or move the workload to nodes with more disk. For batch jobs, consider spreading Pods across more nodes to avoid hotspots.
Also check image cache growth on the node. Old container images can consume significant disk space and trigger pressure.
Diagnostic commands:
kubectl describe pod <pod-name>
kubectl get events -A
Practical notes
- Start with a quick inventory:
kubectl get nodes,kubectl get pods -A, andkubectl get events -A. - Compare desired vs. observed state;
kubectl describeusually explains drift or failed controllers. - Keep names, labels, and selectors consistent so Services and controllers can find Pods.
- Document cache locations so engineers know what data is safe to delete.
Quick checklist
- The resource matches the intent you described in YAML.
- Namespaces, RBAC, and images are correct for the target environment.
- Health checks and logs are in place before promotion.
- Ephemeral usage is bounded with limits.
- Eviction behavior is understood and monitored.
Field checklist
When you move from a quick lab to real traffic, confirm the basics every time. Check resource requests, readiness behavior, log coverage, alerting, and clear rollback steps. A checklist prevents skipping the boring steps that keep services stable. Keep it short, repeatable, and stored with the repo so it evolves with the service and stays close to the code.
Troubleshooting flow
Start from symptoms, not guesses. Review recent events for scheduling, image, or probe failures, then scan logs for application errors. If traffic is failing, confirm readiness, verify endpoints, and trace the request path hop by hop. When data looks wrong, validate the active version and configuration against the release plan. Always record what you changed so a rollback is fast and a postmortem is accurate.
Small exercises to build confidence
Practice common operations in a safe environment. Scale the workload up and down and observe how quickly it stabilizes. Restart a single Pod and watch how the service routes around it. Change one configuration value and verify that the change is visible in logs or metrics. These small drills teach how the system behaves under real operations without waiting for an outage.
Production guardrails
Introduce limits gradually. Resource quotas, PodDisruptionBudgets, and network policies should be tested in staging before production. Keep backups and restore procedures documented, even for stateless services, because dependencies often are not stateless. Align monitoring with user outcomes so you catch regressions before they become incidents.
Documentation and ownership
Write down who owns the service, what success looks like, and which dashboards to use. Include the on-call rotation, escalation path, and basic runbooks for common failures. A small amount of documentation removes a lot of guesswork during incidents and helps new team members ramp up quickly.
Quick validation
After any change, validate the system the same way a user would. Hit the main endpoint, check latency, and watch for error spikes. Confirm that new pods are ready, old ones are gone, and metrics are stable. If the change touched storage, verify disk usage and cleanup behavior. If it touched networking, confirm DNS names and endpoint lists are correct.
Release notes
Write a short note with what changed, why it changed, and how to roll back. This is not bureaucracy; it prevents confusion during incidents. Even a few bullets help future you remember intent and context.
Capacity check
Compare current usage to requests and limits. If the service is close to limits, plan a small scaling adjustment before traffic grows. Capacity planning is easier when it is incremental rather than reactive.
Final reminder
Keep changes small and observable. If a release is risky, reduce scope and validate in staging first. Prefer frequent small updates over rare large ones. When in doubt, pick the option that simplifies rollback and reduces time to detect issues. The goal is not perfect config, but predictable operations.