Introducing Kthena: LLM inference for the cloud native era
Link⚡ TL;DR
📝 Summary
The “Last Mile” Challenge of LLM Serving Kthena: The Intelligent Brain for Cloud Native Inference Core Features and Advantages 1. Production-Grade Inference Orchestration (ModelServing) 2. Out-of-the-Box Deployment (ModelBooster) 3. Intelligent, Model-Aware Routing 4. Cost-Driven Autoscaling 5. Broad Hardware & Engine Support 6. Built-in Flow Control & Fairness Performance Benchmarks Community & Industry Support Start Exploring Kthena Today Posted on January 28, 2026 by Volcano Maintainers CNCF projects highlighted in this post The Volcano community is proud to announce the launch of Kthena , a new sub-project designed for global developers and MLOps engineers. Kthena is a cloud native, high-performance system for Large Language Model (LLM) inference routing, orchestration, and scheduling, tailored specifically for Kubernetes. Engineered to address the complexity of serving LLMs at production scale, Kthena delivers granular control and enhanced flexibility. Through features like topology-aware scheduling, KV Cache-aware routing, and Prefill-Decode (PD) disaggregation, it significantly improves GPU/NPU utilization and throughput while minimizing latency. As a sub-project of Volcano, Kthena extends Volcano’s capabilities beyond AI training, creating a unified, end-to-end solution for the entire AI lifecycle. While LLMs are reshaping industries, deploying them efficiently on Kubernetes remains a complex systems engineering challenge.
Open the original post ↗ https://www.cncf.io/blog/2026/01/28/introducing-kthena-llm-inference-for-the-cloud-native-era/