A practical guide to GPU overprovisioning strategies, including scheduler-level oversubscription, time slicing, memory controls, MIG, vGPU, queue backfill, and operational guardrails.
GPU
Posts under GPU.
Explore more topics
Featured in this category
A practical guide to choosing between serverless GPUs and dedicated GPUs for startups, based on cost structure, delivery speed, performance predictability, operations burden, and team maturity.
An engineering-oriented comparison of KAI-Scheduler’s Reservation Pod approach and HAMi’s hard isolation path, including trade-offs, failure modes (noisy neighbor), and how the two layers can complement each other.
An engineering-oriented guide to hetGPU: how a compiler + runtime stack can make one GPU binary run across NVIDIA/AMD/Intel/Tenstorrent, including SIMT vs MIMD, memory model gaps, and live kernel migration.
A deep dive into Kubernetes GPU virtualization through gpu-manager startup flow, including device interception, topology awareness, scheduling, and allocation mechanics.
All posts in this category
Browse the full archive in reverse chronological order.
GPU Overprovisioning Solutions: From Oversubscription and Sharing to Isolation
A practical guide to GPU overprovisioning strategies, including scheduler-level oversubscription, time slicing, memory controls, MIG, vGPU, queue backfill, and operational guardrails.
How Startups Should Choose: Serverless GPU vs Dedicated GPU
A practical guide to choosing between serverless GPUs and dedicated GPUs for startups, based on cost structure, delivery speed, performance predictability, operations burden, and team maturity.
KAI-Scheduler vs HAMi: Two Ways to Share GPUs in Kubernetes (Soft vs Hard Isolation)
An engineering-oriented comparison of KAI-Scheduler’s Reservation Pod approach and HAMi’s hard isolation path, including trade-offs, failure modes (noisy neighbor), and how the two layers can complement each other.
hetGPU: Chasing Cross-Vendor GPU Binary Compatibility
An engineering-oriented guide to hetGPU: how a compiler + runtime stack can make one GPU binary run across NVIDIA/AMD/Intel/Tenstorrent, including SIMT vs MIMD, memory model gaps, and live kernel migration.
Kubernetes GPU Virtualization Explained Through gpu-manager Startup Flow
A deep dive into Kubernetes GPU virtualization through gpu-manager startup flow, including device interception, topology awareness, scheduling, and allocation mechanics.