Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set job group count to the number of nodes in a node pool #681

Open
josegonzalez opened this issue Jul 26, 2023 · 1 comment
Open

Set job group count to the number of nodes in a node pool #681

josegonzalez opened this issue Jul 26, 2023 · 1 comment
Labels
stage/accepted theme/policy Policy source, parsing and validation type/enhancement

Comments

@josegonzalez
Copy link

For jobs where the scaling method is to match the number of client nodes, node pools offer an effective, native way to describe a cluster of resources. It would be great to automatically match the job group count with the number of instances in a node pool. This would allow users to scale underlying clusters based on metrics such as incoming request count or cpu utilization and have service jobs be placed on each node.

Note that for effective usage, one would have to ensure allocations are on distinct hosts, and that scaling down infrastructure doesn't impact running allocations but simply removes what was allocated on the now removed nodes.

@lgfa29
Copy link
Contributor

lgfa29 commented Dec 22, 2023

Hi @josegonzalez 👋

Interestingly I think I just answered a question like this #797 (comment) 😄

The tricky part is that the job_summary metric I used in the query doesn't have a node_pool label, and I don't think it's even possible to do so given that queued allocs are not running in a any client. We could read the value from the job, but the all node pool would need to be taken into special consideration.

We could also simplify things quite a bit by adding a new query operation to the Nomad APM to just return client counts.

Then there's also the problem mentioned in the comment linked above:

Unfortunately this doesn't work as well because the group policy will not be able to take into consideration the number of queued allocations. So you will be able to scale up the number of clients, but not down 😅

So lots to improve, but node pools do open an interesting points of exploration.

Thanks for the suggestion!

@lgfa29 lgfa29 added stage/accepted type/enhancement theme/policy Policy source, parsing and validation labels Dec 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/accepted theme/policy Policy source, parsing and validation type/enhancement
Projects
None yet
Development

No branches or pull requests

2 participants