dynamic host volumes: node selection via constraints #24518

tgross · 2024-11-20T16:27:58Z

When making a request to create a dynamic host volume, users can pass a node pool and constraints instead of a specific node ID.

This changeset implements a node scheduling logic by instantiating a filter by node pool and constraint checker borrowed from the scheduler package. Because host volumes with the same name can't land on the same host, we don't need to support distinct_hosts/distinct_property; this would be challenging anyways without building out a much larger node iteration mechanism to keep track of usage across multiple hosts.

Ref: #24479

gulducat

LGTM! with one long thought about potential disk utilization

nomad/host_volume_endpoint_test.go

gulducat · 2024-11-20T21:51:49Z

nomad/host_volume_endpoint.go

+	}
+
+	for {
+		raw := iter.Next()


I don't recall how our binpacking algorithm works for allocs. is it like this, where it's just whatever order comes out of state? I suspect, based on no real evidence, that folks won't want to binpack volumes the same way, unless they are the kind of volume that has a disk space limit, and we placed them based on available disk space.

basically, if I'm reading this right, this feels like a recipe for a full disk alert waking someone up.

I suppose their main mechanisms to avoid this would be to

use careful explicit constraints, which seems a little IAC-unfriendly, if they'd need a lot of specs?

reuse the same vol name a lot, so each instance lands on a distinct host

any other considerations I'm missing?

For allocs in the general scheduler (batch/service), we:

find all the nodes in the node pool and DC

shuffle them

iterate over them until we find 2 that are feasible (or a lot more than 2 for jobs with spread)

pick the best of 2

When using spread, we iterate over enough nodes to guarantee we're not putting allocs for the same job on the same host, which is effectively what we're doing here. Operators are going to want to spread volumes with the same "purpose" out because of failure domains. If the node is full, then the plugin will tell us that and we'll get an error back.

When making a request to create a dynamic host volumes, users can pass a node pool and constraints instead of a specific node ID. This changeset implements a node scheduling logic by instantiating a filter by node pool and constraint checker borrowed from the scheduler package. Because host volumes with the same name can't land on the same host, we don't need to support `distinct_hosts`/`distinct_property`; this would be challenging anyways without building out a much larger node iteration mechanism to keep track of usage across multiple hosts. Ref: #24479

vercel bot deployed to Preview – nomad-ui November 20, 2024 16:28 View deployment

tgross force-pushed the dhv-rpc-volume-placement branch from 822d291 to ff4ab14 Compare November 20, 2024 16:48

vercel bot deployed to Preview – nomad-ui November 20, 2024 16:49 View deployment

tgross force-pushed the dhv-rpc-volume-placement branch from ff4ab14 to 11d281e Compare November 20, 2024 16:56

vercel bot deployed to Preview – nomad-ui November 20, 2024 16:57 View deployment

tgross added type/enhancement theme/storage labels Nov 20, 2024

tgross added this to the 1.10.0 milestone Nov 20, 2024

tgross marked this pull request as ready for review November 20, 2024 17:20

tgross requested review from gulducat and pkazmierczak November 20, 2024 17:20

tgross mentioned this pull request Nov 20, 2024

HostVolumePlugin interface and two implementations #24497

Merged

tgross force-pushed the dhv-rpc-volume-placement branch from 11d281e to 7a6cecb Compare November 20, 2024 19:48

vercel bot deployed to Preview – nomad-ui November 20, 2024 19:50 View deployment

gulducat approved these changes Nov 20, 2024

View reviewed changes

tgross force-pushed the dynamic-host-volumes branch from a3784ca to 83de356 Compare November 20, 2024 22:04

tgross requested review from a team as code owners November 20, 2024 22:04

tgross force-pushed the dhv-rpc-volume-placement branch from 7a6cecb to b2b39b8 Compare November 20, 2024 22:07

vercel bot deployed to Preview – nomad-ui November 20, 2024 22:09 View deployment

tgross merged commit e28f99a into dynamic-host-volumes Nov 21, 2024
17 checks passed

tgross deleted the dhv-rpc-volume-placement branch November 21, 2024 14:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dynamic host volumes: node selection via constraints #24518

dynamic host volumes: node selection via constraints #24518

tgross commented Nov 20, 2024 •

edited

Loading

gulducat left a comment

gulducat Nov 20, 2024

tgross Nov 20, 2024

dynamic host volumes: node selection via constraints #24518

dynamic host volumes: node selection via constraints #24518

Conversation

tgross commented Nov 20, 2024 • edited Loading

gulducat left a comment

Choose a reason for hiding this comment

gulducat Nov 20, 2024

Choose a reason for hiding this comment

tgross Nov 20, 2024

Choose a reason for hiding this comment

tgross commented Nov 20, 2024 •

edited

Loading