Ephemeral Volumes

Ephemeral volumes are deleted with the Pod. They are ideal for cache and temporary files that do not need to survive restarts. This makes them lightweight, fast, and safe for transient data.

This quick start expands on use cases, volume types, sizing, and the operational caveats you need to know before relying on ephemeral storage.

What counts as an ephemeral volume

The most common ephemeral volume is emptyDir. Kubernetes also supports generic ephemeral volumes that can use StorageClasses for short-lived claims.

Typical use cases

Runtime cache and compiled assets
Intermediate artifacts during data processing
Shared files between the main container and a sidecar
Scratch space for temp files

emptyDir basics

emptyDir is created when the Pod is scheduled and deleted when the Pod is removed:

volumes:
  - name: scratch
    emptyDir:
      sizeLimit: 1Gi

You can mount it into multiple containers to share data:

volumeMounts:
  - name: scratch
    mountPath: /tmp

Memory-backed emptyDir

For high-speed temp data, you can use memory as the medium:

emptyDir:
  medium: Memory
  sizeLimit: 512Mi

This uses node memory (tmpfs). It is fast but can contribute to OOM if you allocate too much.

Generic ephemeral volume

You can request ephemeral storage from a StorageClass for workloads that need slightly more structure:

volumes:
- name: cache
  ephemeral:
    volumeClaimTemplate:
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 5Gi

The PVC is created and deleted with the Pod, which is useful for batch jobs or CI workloads.

Sidecar pattern example

Use a sidecar to ship logs or process files from a shared volume:

containers:
- name: app
  image: my-app:latest
  volumeMounts:
  - name: shared
    mountPath: /var/log/app
- name: shipper
  image: busybox
  command: ["sh", "-c", "tail -F /var/log/app/app.log"]
  volumeMounts:
  - name: shared
    mountPath: /var/log/app

Scheduling and disk pressure

Pods with heavy ephemeral usage can be evicted when nodes experience disk pressure. Kubernetes tracks ephemeral storage usage and may evict Pods that exceed limits. Keep temp data small and use limits where possible.

To view disk pressure and eviction signals:

kubectl describe node <node>

Resource requests and limits

You can set ephemeral-storage requests and limits in container resources:

resources:
  requests:
    cpu: "100m"
    memory: "256Mi"
    ephemeral-storage: "1Gi"
  limits:
    cpu: "500m"
    memory: "512Mi"
    ephemeral-storage: "2Gi"

This helps the scheduler place Pods on nodes with enough local storage.

Lifecycle and cleanup

Ephemeral volumes are tied to Pod lifecycle. If a Pod is rescheduled to another node, its emptyDir contents are lost. This is fine for caches, but it can break workflows that assume persistence. If you need to keep data across restarts, use PVCs. Plan for data loss by making caches rebuildable and idempotent.

Ephemeral storage and logs

Container logs are stored on the node and count toward ephemeral storage usage. A chatty container can trigger eviction even if its emptyDir usage is small. Set log rotation and avoid excessive debug logging in production.

Best practices

Set size limits on emptyDir whenever possible.
Keep temp data small and clean it periodically.
Separate cache from critical data to avoid accidental loss.
Use memory-backed emptyDir only for small, latency-sensitive files.
Prefer predictable paths so cleanup scripts are reliable.

Size limits and behavior

emptyDir.sizeLimit is a soft limit enforced by the kubelet. If your Pod exceeds it, you may see eviction events. Always budget ephemeral storage alongside CPU and memory so the scheduler can place Pods correctly.

Pods without explicit ephemeral-storage limits can consume more node disk than expected. Setting requests and limits improves predictability and avoids surprise evictions.

Example: init container for cache warm-up

Use an init container to pre-populate a cache into an ephemeral volume:

initContainers:
- name: warm-cache
  image: busybox
  command: ["sh", "-c", "echo warm > /cache/seed.txt"]
  volumeMounts:
  - name: cache
    mountPath: /cache
containers:
- name: app
  image: my-app:latest
  volumeMounts:
  - name: cache
    mountPath: /cache

Example: build workspace

CI jobs often need a scratch directory to compile code. An emptyDir works well:

volumes:
  - name: workspace
    emptyDir: {}
volumeMounts:
  - name: workspace
    mountPath: /workspace

Keep artifacts you care about by uploading them to object storage before the Pod exits.

Eviction thresholds

Kubernetes evicts Pods when node disk pressure exceeds thresholds. Ephemeral volumes and container logs both contribute. If you see frequent evictions, check node disk usage and consider adding larger nodes or spreading workloads.

Generic ephemeral vs PVC

Generic ephemeral volumes are created from a StorageClass and live only for the Pod lifetime. They are useful for batch jobs that need temporary space larger than emptyDir but still do not need persistence. In contrast, PVCs are designed for long-lived data and should be used for databases or stateful services.

Local SSD and performance

On cloud platforms, nodes often have local SSD or ephemeral disks. These are fast but not durable. Use them for caches or build artifacts, and always expect data loss during node replacement or upgrade.

Observability

To understand ephemeral usage, inspect node allocatable storage and Pod consumption:

kubectl describe node <node> | rg -n \"ephemeral-storage|Allocatable\"
kubectl top pod -A

Inside a Pod, you can check filesystem usage:

kubectl exec -it <pod-name> -- df -h

If you see low free space, reduce cache sizes or move data to PVCs.

Security considerations

Ephemeral volumes live on the node filesystem. Avoid writing secrets or sensitive data into emptyDir unless you are sure the node is trusted. If you must, use memory-backed storage and keep it small.

If you handle regulated data, consider encrypting data before writing to ephemeral storage or avoiding it entirely. Always sanitize temporary data.

When not to use ephemeral storage

Databases or any data that must survive Pod restarts.
Audit logs that must be retained.
User uploads or business-critical files.

Troubleshooting tips

Pod Pending: node lacks ephemeral storage capacity.
Evicted: node disk pressure, reduce temp usage.
Slow IO: node disk is saturated by other workloads.

If evictions keep happening, reduce cache size or move the workload to nodes with more disk. For batch jobs, consider spreading Pods across more nodes to avoid hotspots.

Also check image cache growth on the node. Old container images can consume significant disk space and trigger pressure.

Diagnostic commands:

kubectl describe pod <pod-name>
kubectl get events -A

Practical notes

Start with a quick inventory: kubectl get nodes, kubectl get pods -A, and kubectl get events -A.
Compare desired vs. observed state; kubectl describe usually explains drift or failed controllers.
Keep names, labels, and selectors consistent so Services and controllers can find Pods.
Document cache locations so engineers know what data is safe to delete.

Quick checklist

The resource matches the intent you described in YAML.
Namespaces, RBAC, and images are correct for the target environment.
Health checks and logs are in place before promotion.
Ephemeral usage is bounded with limits.
Eviction behavior is understood and monitored.

Field checklist

When you move from a quick lab to real traffic, confirm the basics every time. Check resource requests, readiness behavior, log coverage, alerting, and clear rollback steps. A checklist prevents skipping the boring steps that keep services stable. Keep it short, repeatable, and stored with the repo so it evolves with the service and stays close to the code.

Troubleshooting flow

Start from symptoms, not guesses. Review recent events for scheduling, image, or probe failures, then scan logs for application errors. If traffic is failing, confirm readiness, verify endpoints, and trace the request path hop by hop. When data looks wrong, validate the active version and configuration against the release plan. Always record what you changed so a rollback is fast and a postmortem is accurate.

Small exercises to build confidence

Practice common operations in a safe environment. Scale the workload up and down and observe how quickly it stabilizes. Restart a single Pod and watch how the service routes around it. Change one configuration value and verify that the change is visible in logs or metrics. These small drills teach how the system behaves under real operations without waiting for an outage.

Production guardrails

Introduce limits gradually. Resource quotas, PodDisruptionBudgets, and network policies should be tested in staging before production. Keep backups and restore procedures documented, even for stateless services, because dependencies often are not stateless. Align monitoring with user outcomes so you catch regressions before they become incidents.

Documentation and ownership

Write down who owns the service, what success looks like, and which dashboards to use. Include the on-call rotation, escalation path, and basic runbooks for common failures. A small amount of documentation removes a lot of guesswork during incidents and helps new team members ramp up quickly.

Quick validation

After any change, validate the system the same way a user would. Hit the main endpoint, check latency, and watch for error spikes. Confirm that new pods are ready, old ones are gone, and metrics are stable. If the change touched storage, verify disk usage and cleanup behavior. If it touched networking, confirm DNS names and endpoint lists are correct.

Release notes

Write a short note with what changed, why it changed, and how to roll back. This is not bureaucracy; it prevents confusion during incidents. Even a few bullets help future you remember intent and context.

Capacity check

Compare current usage to requests and limits. If the service is close to limits, plan a small scaling adjustment before traffic grows. Capacity planning is easier when it is incremental rather than reactive.

Final reminder

Keep changes small and observable. If a release is risky, reduce scope and validate in staging first. Prefer frequent small updates over rare large ones. When in doubt, pick the option that simplifies rollback and reduces time to detect issues. The goal is not perfect config, but predictable operations.