Node health for EKS Hybrid Nodes

Release
2025-03-31 ~1 min read docs.aws.amazon.com #eks

⚡ TL;DR

Enable node auto repair and investigate node health issues Node monitoring agent Node auto repair Node health issues Kernel node health issues Networking node health issues Neuron node health issues NVIDIA node health issues Runtime node health issues Storage node health issues Help improve this page To contribute to this user guide, choose the Edit this page on GitHub link that is located in the right pane of every page. Node health refers to the operational status and capability of a node to effectively run workloads.

📝 Summary

Enable node auto repair and investigate node health issues Node monitoring agent Node auto repair Node health issues Kernel node health issues Networking node health issues Neuron node health issues NVIDIA node health issues Runtime node health issues Storage node health issues Help improve this page To contribute to this user guide, choose the Edit this page on GitHub link that is located in the right pane of every page. Node health refers to the operational status and capability of a node to effectively run workloads. A healthy node maintains expected connectivity, has sufficient resources, and can successfully run Pods without disruption. For information on getting details about your nodes, see View the health status of your nodes and Retrieve node logs for a managed node using kubectl and S3. To help with maintaining healthy nodes, Amazon EKS offers the node monitoring agent and node auto repair. The node monitoring agent and node auto repair are only available on Linux. These features aren’t available on Windows. The node monitoring agent automatically reads node logs to detect certain health issues. It parses through node logs to detect failures and surfaces various status information about worker nodes. A dedicated NodeCondition is applied on the worker nodes for each category of issues detected, such as storage and networking issues. Descriptions of detected health issues are made available in the observability dashboard. For more information, see Node health issues.