Docs

Curated Kubernetes content from AKS, EKS, GKE, OpenShift, Rancher/K3s and more—auto‑aggregated daily.

2025-12-30
Kubernetes Blog
Kubernetes v1.35: Watch Based Route Reconciliation in the Cloud Controller Manager
Kubernetes v1.35: Watch Based Route Reconciliation in the Cloud Controller Manager What's new? About the feature gate How can I learn more? Up to and including Kubernetes v1.34, the route controller in Cloud Controller Manager (CCM) implementations built using the k8s. io/cloud-provider library reconciles routes at a fixed interval.
#kubernetes
2025-12-29
Kubernetes Blog
Kubernetes v1.35: Introducing Workload Aware Scheduling
Kubernetes v1.35: Introducing Workload Aware Scheduling Workload aware scheduling Workload API How gang scheduling works Opportunistic batching Restrictions The north star vision Getting started Learn more Scheduling large workloads is a much more complex and fragile operation than scheduling a single Pod, as it often requires considering all Pods together instead of scheduling each one independently. For example, when scheduling a machine learning batch job, you often need to place each worker strategically, such as on the same rack, to make the entire process as efficient as possible.
#kubernetes
2025-12-29
AWS Containers Blog (EKS)
Streamline your containerized CI/CD with GitLab Runners and Amazon EKS Auto Mode
Streamline your containerized CI/CD with GitLab Runners and Amazon EKS Auto Mode Solution overview Prerequisites Walkthrough Configuration files Get started Policies Cleaning up Conclusion About the authors Although Kubernetes offers scalability for GitLab Runner deployments, the operational overhead can’t be ignored. Organizations starting with containerized continuous integration/continuous development (CI/CD) without mature container practices often underestimate both the financial implications and security complexities.
#eks #aws
2025-12-29
AWS Containers Blog (EKS)
Part 2: Observing and scaling MLOps infrastructure on Amazon EKS
Part 2: Observing and scaling MLOps infrastructure on Amazon EKS The unique challenges of MLOps monitoring Understanding your ML hardware landscape NVIDIA GPUs with CUDA Key features for ML workloads Latest advancements (NVIDIA GPUs and AWS Custom silicon chips) Building your monitoring strategy framework Essential metrics by hardware type AWS Neuron specific metrics Implementing Prometheus for metrics collection Understanding Prometheus Prometheus exposition formats The kube-prometheus-stack Implementing scaling based on custom metrics Monitoring and scaling an application Visualizing ML operations with Grafana Monitoring with third party solutions Evaluation criteria Conclusion About the authors In part 1 of this series, Introduction to observing machine learning workloads on Amazon EKS , we established several key foundational concepts. We explored the fundamental differences between monitoring machine learning (ML) and traditional workloads, emphasizing how ML systems require more specialized metrics and granular monitoring.
#eks #aws
2025-12-29
AWS Containers Blog (EKS)
Efficient image and model caching strategies for AI/ML and generative AI workloads on Amazon EKS
Efficient image and model caching strategies for AI/ML and generative AI workloads on Amazon EKS The role of storage in AI/ML Data loading performance Storage IO for checkpointing Container image caching options Data volumes for Bottlerocket Secondary EBS volumes on AL2023 Using NVMe with RAID0 for Kubelet and Containerd Storage and caching options Amazon S3 S3 Express One Zone Optimizing code for Amazon S3 APIs Increasing per-client throughput Reducing latencies for frequently read data FSx for Lustre Conclusion About the authors When organizations deploy generative AI and machine learning (ML) workloads on Amazon Elastic Kubernetes Service (Amazon EKS ), implementing efficient caching strategies becomes crucial for both performance and cost optimization. Storage and caching play major roles throughout the lifecycle of any AI, ML, or generative AI workloads on Amazon EKS.
#eks #aws
2025-12-29
AWS Containers Blog (EKS)
Implementing assurance pipeline for Amazon EKS Platform
Implementing assurance pipeline for Amazon EKS Platform Current pain points in validating EKS clusters Solution overview Prerequisites Walkthrough 1. Unit testing with Terraform test 2.
#eks #aws
2025-12-29
CNCF
How to integrate Kairos architecturally into an edge AI platform
Posted on December 29, 2025 by Jordan Karapanagiotis, Software Engineer - Aurea Imaging, Mauro Morales, Staff Engineer & Kairos Maintainer - Spectro Cloud CNCF projects highlighted in this post Remote sensing in agriculture requires complex systems that are able to communicate with various external devices like GPS and cameras, and use machine learning and AI inference to provide insights to the grower regarding their orchard, down to tree and crop-level precision. Aurea Imaging, a Dutch startup company, specializes in remote sensing solutions for agriculture using an embedded device with a powerful GPU-enabled NVIDIA Jetson on board.
#cncf
2025-12-29
VMware Cloud Foundation Blog
NVMe Memory Tiering Design and Sizing on VMware Cloud Foundation 9 Part 5: Deployment Scenarios (Greenfield, Brownfield, Nested Lab)
Greenfield Deployments Brownfield Deployments Lab Deployments Discover more from VMware Cloud Foundation (VCF) Blog Related Articles NVMe Memory Tiering Design and Sizing on VMware Cloud Foundation 9 Part 5: Deployment Scenarios (Greenfield, Brownfield, Nested Lab) NVMe Memory Tiering Design and Sizing on VMware Cloud Foundation 9 Part 4: vSAN Compatibility and Storage Considerations NVMe Memory Tiering Design and Sizing on VMware Cloud Foundation 9 Part 3: Sizing for Success In this part of the blog series, I want to provide some information about the differences when enabling Memory Tiering in different scenarios. Although the core process remains the same, there are things that may require extra attention and planning to save some time and effort.
#vmware #cloud-foundation #kubernetes
2025-12-24
Tigera
Do You Need a Service Mesh? Understanding the Role of CNI vs. Service Mesh
What a CNI Actually Does The CNI’s Core Responsibilities (and Their Limits) What a CNI Does Not Do What is a Service Mesh What a Service Mesh Adds Where CNI and Service Mesh Overlap So When Do You Need a Service Mesh? A Layered Model: Outer Perimeter and Inner Core Calico and Istio: A Combined Approach One Last Thing: Complexity and Tradeoffs Get Started with Calico and Istio Today The world of Kubernetes networking can sometimes be confusing. What’s a CNI? A service mesh? Do I need one? Both? And how do they interact in my cluster? The questions can go on and on.
#tigera
2025-12-24
CNCF
Bringing sustainability back into the conversation: CNCF Cloud Native Sustainability Month Tokyo
#cncf