-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dynamic host volumes: node selection via constraints #24518
Conversation
822d291
to
ff4ab14
Compare
ff4ab14
to
11d281e
Compare
11d281e
to
7a6cecb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! with one long thought about potential disk utilization
} | ||
|
||
for { | ||
raw := iter.Next() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall how our binpacking algorithm works for allocs. is it like this, where it's just whatever order comes out of state? I suspect, based on no real evidence, that folks won't want to binpack volumes the same way, unless they are the kind of volume that has a disk space limit, and we placed them based on available disk space.
basically, if I'm reading this right, this feels like a recipe for a full disk alert waking someone up.
I suppose their main mechanisms to avoid this would be to
- use careful explicit constraints, which seems a little IAC-unfriendly, if they'd need a lot of specs?
- reuse the same vol name a lot, so each instance lands on a distinct host
any other considerations I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For allocs in the general scheduler (batch/service), we:
- find all the nodes in the node pool and DC
- shuffle them
- iterate over them until we find 2 that are feasible (or a lot more than 2 for jobs with
spread
) - pick the best of 2
When using spread
, we iterate over enough nodes to guarantee we're not putting allocs for the same job on the same host, which is effectively what we're doing here. Operators are going to want to spread volumes with the same "purpose" out because of failure domains. If the node is full, then the plugin will tell us that and we'll get an error back.
a3784ca
to
83de356
Compare
When making a request to create a dynamic host volumes, users can pass a node pool and constraints instead of a specific node ID. This changeset implements a node scheduling logic by instantiating a filter by node pool and constraint checker borrowed from the scheduler package. Because host volumes with the same name can't land on the same host, we don't need to support `distinct_hosts`/`distinct_property`; this would be challenging anyways without building out a much larger node iteration mechanism to keep track of usage across multiple hosts. Ref: #24479
7a6cecb
to
b2b39b8
Compare
When making a request to create a dynamic host volume, users can pass a node pool and constraints instead of a specific node ID.
This changeset implements a node scheduling logic by instantiating a filter by node pool and constraint checker borrowed from the scheduler package. Because host volumes with the same name can't land on the same host, we don't need to support
distinct_hosts
/distinct_property
; this would be challenging anyways without building out a much larger node iteration mechanism to keep track of usage across multiple hosts.Ref: #24479