You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We recently run into two separate issue where the Nomad autoscaler failed to describe AWS autoscaling groups due to an expired AWS token or failed to evaluate a scaling policy because of an issue reaching the APM (Prometheus).
{"@level":"warn","@message":"failed to get target status","@module":"policy_manager.policy_handler","@timestamp":"2023-07-06T16:21:26.029652Z","error":"failed to describe AWS Autoscaling Group: operation error Auto Scaling: DescribeAutoScalingGroups, https response error StatusCode: 403, RequestID: c674bc86-1234-4fb1-5678-b264741176bc, api error ExpiredToken: The security token included in the request is expired","policy_id":"613aeb80-xs23-8f4e-1234-ef2ca2748d8a"}
It would be great to have a couple of extra Prometheus metrics exported by the autoscaler to be monitored to detect simple failures.
The text was updated successfully, but these errors were encountered:
We recently run into two separate issue where the Nomad autoscaler failed to describe AWS autoscaling groups due to an expired AWS token or failed to evaluate a scaling policy because of an issue reaching the APM (Prometheus).
It would be great to have a couple of extra Prometheus metrics exported by the autoscaler to be monitored to detect simple failures.
The text was updated successfully, but these errors were encountered: