dra-evolution: restore minimal support for "classic DRA" #26

pohly · 2024-06-06T10:36:37Z

"Classic DRA" has a control plane controller which handles allocation and deallocation of a claim in cooperation with the scheduler. That is the part which gets added back to the API.

What doesn't get restored compared to v1.30.0 is:

immediate allocation
using a CRD as user-facing API
the opaque "resource handle" in the allocation result

Drivers which want to support this mode of operation have to use the in-tree types. The "config" fields support that, albeit perhaps with a slightly less natural API.

While at it, some occurrences of "resource" get replaced with "device" or "DRA".

/assign @thockin @klueska

"Classic DRA" has a control plane controller which handles allocation and deallocation of a claim in cooperation with the scheduler. That is the part which gets added back to the API. What doesn't get restored compared to v1.30.0 is: - immediate allocation - using a CRD as user-facing API - the opaque "resource handle" in the allocation result Drivers which want to support this mode of operation have to use the in-tree types. The "config" fields support that, albeit perhaps with a slightly less natural API. While at it, some occurrences of "resource" get replaced with "device" or "DRA".

k8s-ci-robot · 2024-06-06T10:36:44Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [pohly]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

dra-evolution/pkg/api/claim_types.go

pohly · 2024-06-06T10:43:12Z

dra-evolution/pkg/api/claim_types.go

-	Data *StructuredDriverData `json:"data,omitempty" protobuf:"bytes,2,opt,name=data"`
-
-	// Alternative to "Data", not describe further here for the sake of brevity:
-	// OpaqueData *string // Set by control plane controller when using "classic DRA".


I came to the conclusion that we don't need this opaque data and thus StructuredDriverData can be inlined.

A control plane controller can and should use instead the in-tree RequestAllocationResult. Then selecting a specific request for a container works the same way in both allocation modes.

So if I'm reading this diff correctly, the only fields in this struct now are DriverName and Config? Is this because the new Config section is (under the hood) just a RawExtension, so that can now be used to encapsulate any opaque data that needs to be passed through?

Yes, that was the idea.

This moves the "results" list up in the AllocationResult, which matches the "requests" in the spec. The claim.spec.onfig directly corresponds to allocationResult.config.

pohly · 2024-06-06T19:00:32Z

dra-evolution/pkg/api/claim_types.go

+	// This node name together with the driver name and
+	// the device name field identify which device was allocated.
+	NodeName string `json:"nodeName"`


Let's generalize this a bit more so that it also works for network-attached devices, whether they get allocated through a control plane controller (this would be possible in 1.31) or with some future extension of structured parameters.

Suggested change

// This node name together with the driver name and

// the device name field identify which device was allocated.

NodeName string `json:"nodeName"`

// This name together with the driver name and

// the device name field identify which device was allocated.

//

// For a driver with node-local devices, it is the name of the node

// and the overall ID is `<driver name>/<segment name = node name>/<device name>`.

//

// For a driver where device names are unique in a certain subset of

// the cluster (region, rack, etc.), the name identifies that segment.

//

// For a driver where device names are unique in the cluster, this

// name may be empty.

//

// +optional

SegmentName string `json:"segmentName,omitempty"`

I'm torn between allowing just one intermediate segment and many. It could make sense to allow <driver name>/<region>/<data center>/<device name> as the string ID for a device.

We want to be sure that there is a driver name and device name, but what the driver puts into the middle doesn't really matter.

Naming, oh why do you have to be so difficult?!

Using "PoolName" is tempting ("names the pool in which the device name must be unique"), but it clashes with ResourcePool.Name.

DevicePoolName? Max length 128 characters, DNS sub-domains separated by slashes. Optional.

That goes well together with DeviceName.

Once we get to structured parameters for network-attached devices, a ResourcePool object for those would have:

a NodeSelector *v1.NodeSelector instead of NodeName string`.

such a DevicePoolName that serves as part of the unique ID

We have to allow a node name, so the max length has to be 253.

I've pushed this as a third commit.

And here's how this would work together with a ResourcePool (but note my question about that name:

wg-device-management/dra-evolution/pkg/api/capacity_types.go

Lines 9 to 21 in 78e7edd

// ResourcePool represents a collection of devices managed by a given driver. How

// devices are divided into pools is driver-specific, but typically the

// expectation would a be a pool per identical collection of devices, per node.

// It is fine to have more than one pool for a given node, for the same driver.

//

// Where a device gets published may change over time. The unique identifier

// for a node-local device is the tuple `<driver name>/<node name>/<device name>`. Each

// of these names is a DNS label or domain, so it is okay to concatenate them

// like this in a string with a slash as separator.

//

// For non-local devices, the driver can either make all device names globally

// unique (`<driver name>/<device name>`) or provide a device pool name

// (`<driver name>/<device pool name>/<device name>`).

wg-device-management/dra-evolution/pkg/api/capacity_types.go

Lines 50 to 102 in 78e7edd

type ResourcePoolSpec struct {

// NodeName identifies the node which provides the devices. All devices

// are local to that node.

//

// Node name and device pool are mutually exclusive. At least one must be set.

NodeName string `json:"nodeName, omitempty"`

// DevicePool can be used for non-instead of a NodeName to define where the devices

// are available.

//

// Node name and device pool are mutually exclusive. At least one must be set.

DevicePool *DevicePool `json:"devicePool,omitempty"`

// POTENTIAL FUTURE EXTENSION: NodeSelector *v1.NodeSelector

// DriverName identifies the DRA driver providing the capacity information.

// A field selector can be used to list only ResourcePool

// objects with a certain driver name.

//

// Must be a DNS subdomain and should end with a DNS domain owned by the

// vendor of the driver.

DriverName string `json:"driverName" protobuf:"bytes,3,name=driverName"`

// Devices lists all available devices in this pool.

//

// Must not have more than 128 entries.

Devices []Device `json:"devices,omitempty"`

// FUTURE EXTENSION: some other kind of list, should we ever need it.

// Old clients seeing an empty Devices field can safely ignore the (to

// them) empty pool.

}

const ResourcePoolMaxDevices = 128

// DevicePool is a more general description for a set of devices.

// A single node is a special case of this.

type DevicePool struct {

// This name together with the driver name and the device name

// identify a device.

//

// Must not be longer than 253 characters and may contain one or more

// DNS sub-domains separated by slashes.

//

// +optional

Name string `json:"name,omitempty"`

// This identifies nodes where the devices may be used. If unset,

// all nodes have access.

//

// +optional

NodeSelector *v1.NodeSelector `json:"nodeSelector,omitempty"`

}

I find it super confusion to have both ResourcePool and DevicePool. I think we can find a better name.

Based on the discussion in the other comment, the purpose of this is to allow the controller to identify the "owner" of the device. In the case of node-local, that is always the node so we didn't need anything but node name. With networked devices, this is controller-specific and effectively an opaque (to K8s) identifier. I can see where you are going with "device pool" in that sense, but I think it's too confusing. So, let's brainstorm some other ideas. What about something of this form: {"",Device,Resource} + {Manager,Owner,Driver} + {Id,Identifier,Key,Token,OpaqueId}. So, like, DriverOpaqueId would be one possible name based on that. Other ideas?

Actually I really like DriverOpaqueId. It makes it clear it is 1) driver-specific; 2) opaque to k8s (not a resource name for example); 3) an identifier

I agree, the naming of ResourcePool needs further thought. The current name implies "this is one pool" but it's really just a subset of some other, larger pool, the one which contains all devices provided by the same driver name and published with the same "DriverOpaqueID" (or whatever we call it).

I would stay with the ResourceSlice type and define it like this:

Devices are identified by the name of the driver, a pool name, and a unique name inside that pool. A driver publishes information about its resources in one or more ResourceSlice objects. Each slice is part of one pool and contains the common attributes of a pool (the pool name, which nodes have access to the devices in the pool) and some of the devices in the pool. The common attributes are provided in each slice to enable correlating them (via the driver name and pool name) and to provide all relevant information without having to look up other objects.

Another common attribute is the total number of devices in the pool. This information enables consumers to determine whether they have seen all slices of a pool.

A slices belonging to the same pool must share the same generation number (TODO: can we use ObjectMeta.Generation for this or do we have to define our own field?). When a consumer sees slices of different generations, it should ignore all old slices. This can happen while the driver is updating the pool.

Pools that are local to a node are special. They get published with a node name, which then also serves as pool name. There are special permission checks that limit access from a node to slices with the same node name. Provisioning a new node of the same type is likely to provide the same set of devices as some old node.

With this definition, we can safely implement "give me access to all devices" because scheduler knows when it lacks information. The "pool name" is meaningful and won't get confused with the name of a ResourcePool.

This begs the question whether we should have a separate "pool" type. I don't think we need it, some small amount of duplicated information IMHO is fine.

Ahh...yes, this makes sense to me. What if we took this a little further, and we do have two types: DevicePool and ResourceSlice. Instead of "Pools that are local to a node are special. They get published with a node name, which then also serves as pool name.", we could have the idea of a "DriverInstance" which could be the node name for node-local devices, but could vary with other types of devices. So, it would be the four values: (DriverName, DriverInstance, DevicePoolName, DeviceName) that identifies an individual device. As you say, ResourceSlice would be the mechanism by which we publish the elements of the DevicePool. We could publish a DevicePool resource as well, and it would include a total device count and/or a slice count, and a generation ID as you say, so that we have this one object that we can use to ensure consistency across all the slices for that pool. We could also normalize common attributes up to the pool, as we have discussed before; I don't think we would need to normalize them up to the slice, just to the pool.

So - I think this is a good model. What I really like is it weakens the "node" boundary in a way I think will be important in the long run (it's even in our charter...). What I don't like, is that it requires a lot of explanation. I think if we omit the DevicePool object itself, it will be harder for people to understand. So I would want to have that in there. I think it also helps with revision control - it lets us know what the current generation is. Otherwise, I imagine some race conditions when we just say it is "the latest one we have seen".

I've added these ideas with a different wording to kubernetes/enhancements#4709.

Let's discuss there.

dra-evolution/testdata/pod-one-container-one-gpu-one-vf.yaml

dra-evolution/pkg/api/claim_types.go

Calling this "NodeName" didn't quite capture the purpose. It was correct for node-local devices, but only for those.

pohly · 2024-06-07T08:38:11Z

dra-evolution/testdata/pod-one-container-one-gpu-one-vf.yaml

-  - deviceClassName: foozer.example.com
-  - deviceClassName: sriov-nic-example.org
+  - name: gpu
+    deviceClassName: foozer.example.com


We decided to make request names mandatory. Not all examples have been updated, so I am doing it here now because I am touching it.

This is the result of the prototyping in https://github.com/kubernetes-sigs/wg-device-management/tree/main/dra-evolution. The key changes compared to 1.30 are: - Removal of CRD support. Vendor configuration parameters are still supported as raw extensions of the in-tree types. - Removal of the notion of a "device model" and a selection mechanism that only works for one such model. This simplifies the API. - Instead of one claim referencing a ResourceClass, individual requests inside a claim now reference a DeviceClass. - Support for "match attributes". - Support for "management access". - Support for selecting devices from a request in a claim for a container instead of exposing all devices of a claim. Editorial changes: - Reduced usage of the word "resource" in favor of "devices" and "DRA drivers". - Removal of some out-dated text. This revision of the KEP roughly corresponds to https://github.com/kubernetes-sigs/wg-device-management/tree/258d109f54d3baaa48719519dec37a450a1dd5a1/dra-evolution (= kubernetes-sigs/wg-device-management#14 with some minor edits). What is missing: - renaming of ResourceSlice -> ResourcePool (https://github.com/kubernetes-sigs/wg-device-management/pull/28/files#r1630964004) What was added from kubernetes-sigs/wg-device-management#26: - ResourceClaim.Status.Allocation.NodeName -> NodeSelector - flattening of the allocation result Undecided: - kubernetes-sigs/wg-device-management#26 - kubernetes-sigs/wg-device-management#16 - kubernetes-sigs/wg-device-management#18

klueska · 2024-06-08T08:11:04Z

dra-evolution/pkg/api/claim_types.go

+	// DriverName specifies the name of the DRA driver whose kubelet
+	// plugin should be invoked to process the allocation once the claim is
+	// needed on a node.
+	//
+	// Must be a DNS subdomain and should end with a DNS domain owned by the
+	// vendor of the driver.
+	DriverName string `json:"driverName" protobuf:"bytes,1,name=driverName"`


Why do we need this both here and in DriverData? Speaking of which, I don't see DriverData embedded in any of the other types here. Remind me where it gets referenced again?

DriverName became obsolete because I moved all of its fields directly into AllocationResult. I've removed it now.

klueska · 2024-06-08T08:14:58Z

dra-evolution/pkg/api/claim_types.go

+	// DNS sub-domains separated by slashes.
+	//
+	// +optional
+	DevicePoolName string `json:"devicePoolName,omitempty"`


When I see this I think of out whole ResourcePool and DevicePool discussion and it makes me think this is referring to the name of the ResourcePool that the device in this result was allocated from. I think it is probably meant to be something different though, but the previous discussions are carrying too much baggage for me to understand exactly what that is.

#26 (comment)

klueska · 2024-06-08T08:19:08Z

dra-evolution/pkg/api/claim_types.go

+// DriverConfigurationParameters must have one and only one one field set.
+//
+// In contrast to ConfigurationParameters, the driver name is
+// not included and has to be infered from the context.
+type DriverConfigurationParameters struct {
+	Opaque *runtime.RawExtension `json:"opaque,omitempty" protobuf:"bytes,1,opt,name=opaque"`
+}
+


I don't quite understand this comment. The ConfigurationParameters and DriverConfigurationParameters structs look identical to me.

Edit: I see the difference now -- the other struct has an embedded OpaqueConfigurationParameters with an embedded DriveName field. rather than a direct field for runtime.RawExtension.

It's still confusing to look at, but I at least see the difference now...

Would it make more sense to repeat the driver name, i.e. use ConfigurationParameters?

That could look odd in YAML because the driver name will be more obviously redundant:

results: - driverName: dra.example.com config: - opaque: driverName: dra.example.com parameters: ...

Perhaps it actually makes sense? One driver could understand the parameters of another.

We don't do that with structured parameters (it's defined as "only configs for the driver are passed down to it"), but perhaps that could change and a control plane controller could already do it now.

Not added yet to kubernetes/enhancements#4709 but perhaps we should?

If we do, we probably should make it consistent and also pass down all configurations down when using structured parameters.

This is the result of the prototyping in https://github.com/kubernetes-sigs/wg-device-management/tree/main/dra-evolution. The key changes compared to 1.30 are: - Removal of CRD support. Vendor configuration parameters are still supported as raw extensions of the in-tree types. - Removal of the notion of a "device model" and a selection mechanism that only works for one such model. This simplifies the API. - Instead of one claim referencing a ResourceClass, individual requests inside a claim now reference a DeviceClass. - Support for "match attributes". - Support for "management access". - Support for selecting devices from a request in a claim for a container instead of exposing all devices of a claim. Editorial changes: - Reduced usage of the word "resource" in favor of "devices" and "DRA drivers". - Removal of some out-dated text. This revision of the KEP roughly corresponds to https://github.com/kubernetes-sigs/wg-device-management/tree/258d109f54d3baaa48719519dec37a450a1dd5a1/dra-evolution (= kubernetes-sigs/wg-device-management#14 with some minor edits). What is missing: - renaming of ResourceSlice -> ResourcePool (https://github.com/kubernetes-sigs/wg-device-management/pull/28/files#r1630964004) What was added from kubernetes-sigs/wg-device-management#26: - ResourceClaim.Status.Allocation.NodeName -> NodeSelector - flattening of the allocation result Undecided: - kubernetes-sigs/wg-device-management#26 - kubernetes-sigs/wg-device-management#16 - kubernetes-sigs/wg-device-management#18

k8s-triage-robot · 2024-09-07T10:15:28Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-ci-robot assigned klueska and thockin Jun 6, 2024

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 6, 2024

k8s-ci-robot requested review from johnbelamaric and klueska June 6, 2024 10:36

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 6, 2024

pohly commented Jun 6, 2024

View reviewed changes

dra-evolution/pkg/api/claim_types.go Show resolved Hide resolved

pohly commented Jun 6, 2024

View reviewed changes

dra-evolution: make allocation more like the spec

e9150ad

This moves the "results" list up in the AllocationResult, which matches the "requests" in the spec. The claim.spec.onfig directly corresponds to allocationResult.config.

pohly commented Jun 6, 2024

View reviewed changes

johnbelamaric reviewed Jun 6, 2024

View reviewed changes

dra-evolution/testdata/pod-one-container-one-gpu-one-vf.yaml Show resolved Hide resolved

johnbelamaric reviewed Jun 6, 2024

View reviewed changes

dra-evolution/pkg/api/claim_types.go Show resolved Hide resolved

johnbelamaric mentioned this pull request Jun 6, 2024

DRA API evolution #14

Merged

dra-evolution: use more general "DevicePoolName" as middle part of ID

11da42a

Calling this "NodeName" didn't quite capture the purpose. It was correct for node-local devices, but only for those.

pohly commented Jun 7, 2024

View reviewed changes

pohly mentioned this pull request Jun 7, 2024

WIP: dra-evolution: network-attached devices #28

Closed

pohly mentioned this pull request Jun 7, 2024

KEP-4381: DRA update for 1.31 kubernetes/enhancements#4709

Merged

4 tasks

klueska reviewed Jun 8, 2024

View reviewed changes

dra-evolution: remove obsolete DriverData

6d34ff6

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 7, 2024

pohly closed this Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dra-evolution: restore minimal support for "classic DRA" #26

dra-evolution: restore minimal support for "classic DRA" #26

pohly commented Jun 6, 2024

k8s-ci-robot commented Jun 6, 2024

pohly Jun 6, 2024

klueska Jun 8, 2024

pohly Jun 8, 2024

pohly Jun 6, 2024

pohly Jun 7, 2024

pohly Jun 7, 2024

pohly Jun 7, 2024

pohly Jun 7, 2024

johnbelamaric Jun 8, 2024

johnbelamaric Jun 8, 2024

pohly Jun 8, 2024 •

edited

Loading

johnbelamaric Jun 8, 2024

pohly Jun 9, 2024

pohly Jun 7, 2024

klueska Jun 8, 2024

pohly Jun 9, 2024

klueska Jun 8, 2024

pohly Jun 9, 2024

klueska Jun 8, 2024

klueska Jun 8, 2024

pohly Jun 9, 2024

pohly Jun 9, 2024

pohly Jun 9, 2024

k8s-triage-robot commented Sep 7, 2024

-	// This node name together with the driver name and
-	// the device name field identify which device was allocated.
-	NodeName string `json:"nodeName"`
+	// This name together with the driver name and
+	// the device name field identify which device was allocated.
+	//
+	// For a driver with node-local devices, it is the name of the node
+	// and the overall ID is `<driver name>/<segment name = node name>/<device name>`.
+	//
+	// For a driver where device names are unique in a certain subset of
+	// the cluster (region, rack, etc.), the name identifies that segment.
+	//
+	// For a driver where device names are unique in the cluster, this
+	// name may be empty.
+	//
+	// +optional
+	SegmentName string `json:"segmentName,omitempty"`

	// ResourcePool represents a collection of devices managed by a given driver. How
	// devices are divided into pools is driver-specific, but typically the
	// expectation would a be a pool per identical collection of devices, per node.
	// It is fine to have more than one pool for a given node, for the same driver.
	//
	// Where a device gets published may change over time. The unique identifier
	// for a node-local device is the tuple `<driver name>/<node name>/<device name>`. Each
	// of these names is a DNS label or domain, so it is okay to concatenate them
	// like this in a string with a slash as separator.
	//
	// For non-local devices, the driver can either make all device names globally
	// unique (`<driver name>/<device name>`) or provide a device pool name
	// (`<driver name>/<device pool name>/<device name>`).

	type ResourcePoolSpec struct {
	// NodeName identifies the node which provides the devices. All devices
	// are local to that node.
	//
	// Node name and device pool are mutually exclusive. At least one must be set.
	NodeName string `json:"nodeName, omitempty"`

	// DevicePool can be used for non-instead of a NodeName to define where the devices
	// are available.
	//
	// Node name and device pool are mutually exclusive. At least one must be set.
	DevicePool *DevicePool `json:"devicePool,omitempty"`

	// POTENTIAL FUTURE EXTENSION: NodeSelector *v1.NodeSelector

	// DriverName identifies the DRA driver providing the capacity information.
	// A field selector can be used to list only ResourcePool
	// objects with a certain driver name.
	//
	// Must be a DNS subdomain and should end with a DNS domain owned by the
	// vendor of the driver.
	DriverName string `json:"driverName" protobuf:"bytes,3,name=driverName"`

	// Devices lists all available devices in this pool.
	//
	// Must not have more than 128 entries.
	Devices []Device `json:"devices,omitempty"`

	// FUTURE EXTENSION: some other kind of list, should we ever need it.
	// Old clients seeing an empty Devices field can safely ignore the (to
	// them) empty pool.
	}

	const ResourcePoolMaxDevices = 128

	// DevicePool is a more general description for a set of devices.
	// A single node is a special case of this.
	type DevicePool struct {
	// This name together with the driver name and the device name
	// identify a device.
	//
	// Must not be longer than 253 characters and may contain one or more
	// DNS sub-domains separated by slashes.
	//
	// +optional
	Name string `json:"name,omitempty"`

	// This identifies nodes where the devices may be used. If unset,
	// all nodes have access.
	//
	// +optional
	NodeSelector *v1.NodeSelector `json:"nodeSelector,omitempty"`
	}

dra-evolution: restore minimal support for "classic DRA" #26

dra-evolution: restore minimal support for "classic DRA" #26

Conversation

pohly commented Jun 6, 2024

k8s-ci-robot commented Jun 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly Jun 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-triage-robot commented Sep 7, 2024

pohly Jun 8, 2024 •

edited

Loading