Replies: 1 comment 3 replies
-
Hello @nitinatgh, thanks for opening this discussion. The main focus of Node Termination Handler is to cordon and drain the node before the node gets terminated due to any spot interruption notice that is sent, we do not have immediate action item to support the feature that you suggested. Currently NTH has become complex due to various configs and use cases due to multiple requests from customers, we do not want to increase the complexity by adding new enhancements, we will update this thread and let you know if we are planning to pick this enhancement. Thanks again for opening the discussion. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi Team,
Have a question/idea for metrics relating to availability for NTH.
Currently there is only 2 metrics available and each has different action associated with it:
For our purpose, we'd only like to focus on the tasks that NTH needs to execute for handling the spot interruption, so everything from receiving the notice till letting kube api know. Everything after that we are not concerned about. So e.g. we have seen errors for cordon-and-drain as the node can't be drained in time due to various reason such as PDB or
terminationGracePeriodSeconds
=>120s
, for this we believe it's down to kube to handle and not NTH.Is it possible to have such an action added to only show up to that point, this way we can have a proper availability metric for our clusters for NTH and not get false alerts as a result of the above.
Please let me know your thoughts around this.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions