KEP 4381: DRA structured parameters: updates, promotion to GA #5333

pohly · 2025-05-23T06:30:48Z

One-line PR description: DRA structured parameters: updates, promotion to GA

Issue link: DRA: structured parameters #4381

Other comments:

Some of the existing content was a bit stale, for example the v1beta2 API changes were missing. Seamless upgrades were already added in 1.33.

New for 1.34 are the dra_resource_claims_in_use metrics and the Filter timeout. They are part of filling gaps identified for GA. With those gaps closed, the criteria for GA should be satisfied, so promotion to GA gets proposed for 1.34.

k8s-ci-robot · 2025-05-23T06:30:55Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pohly
Once this PR has been reviewed and has the lgtm label, please assign derekwaynecarr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-node/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pohly · 2025-05-23T06:33:22Z

keps/sig-node/4381-dra-structured-parameters/README.md

-in this KEP.
+in this KEP. It's documentation and code describe best practices for
+developing a DRA driver. It is used by the
+[example DRA driver](https://github.com/kubernetes-sigs/dra-example-driver)


For reference: we are considering renaming the repository to "dra-driver-example" (kubernetes/org#5597 (comment)).

pohly · 2025-05-23T06:35:04Z

keps/sig-node/4381-dra-structured-parameters/README.md

+1.32. Only the `v1` version is served by default. `v1beta1` must remain
+supported for encoding/decoding. Other betas remain available
+as long as required by the [deprecation policy](https://kubernetes.io/docs/reference/using-api/deprecation-policy/)
+but need to be enabled explicitly.


I don't see a good path towards removing v1beta1, which is unfortunate because the conversion is ugly due to the different structure. If anyone has suggestions then I am all ears...

Not a blocker for GA.

What blocks us from removing the v1beta1 API once we have switched the serialization version to v1beta2 (ref kubernetes/kubernetes#129889)?

Existing, upgraded clusters may have objects stored with v1beta1. There is https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/2330-migrating-api-objects-to-latest-storage-version but it seems stuck at alpha.

pohly · 2025-05-23T06:43:03Z

keps/sig-node/4381-dra-structured-parameters/README.md

+depends on the number of requests per ResourceClaim, number of ResourceClaims,
+number of published devices in ResourceSlices, and the complexity of the
+requests. Other checks besides CEL evaluation also take time (usage checks,
+match attributes, etc.).


This is the important part for GA. I think we want the mitigation with a timeout, but on the other hand that is new functionality.

The default duration is also entirely up for debate.

I left it open if and how the timeout can be disabled. I suppose a timeout <= 0 should mean "no timeout". Do we also want a separate feature gate (beta, on by default)? Modifying a feature gate is easier than modifying the scheduler configuration.

I also do think we need the timeout. As described in kubernetes/kubernetes#131730, there are certain situations where the allocation can end up taking a very long time, which would block scheduling of all pods. The situations where this happens probably aren't common, but also not completely crazy.

@mortent: what's your opinion about having a separate feature gate?

@sanposhiho: does the proposal look correct to you? We discussed this before on Slack.

I now also have an implementation ready for review: kubernetes/kubernetes#132033

It doesn't have an extra feature gate yet

Timeout seems like a good approach to me.

If this timeout would be able to cause some pods being unschedulable, we probably should have a feature gate. However, I don't have a strong opinion on it.

I'm also leaning towards having a feature gate.

pohly · 2025-05-23T06:43:54Z

keps/sig-node/4381-dra-structured-parameters/README.md

+
+A DRA driver may use
+[seamless upgrades](https://github.com/kubernetes/kubernetes/blob/582b421393d0fad2ad4a83feba88977ac4434662/pkg/kubelet/pluginmanager/pluginwatcher/README.md#seamless-upgrade)
+to ensure that there is always a running driver instance on a node.


I decided against copying the content from that document into this KEP. I think having it in one place is enough.

pohly · 2025-05-23T06:58:23Z

keps/sig-node/4381-dra-structured-parameters/README.md


-The following metric mirrors the `csi_operations_seconds`:
+The following metric mirrors the `csi_operations_seconds`, i.e.
+provides information about gRPC calls issued by the kubelet:


We did not promote any of the metrics from alpha to beta when graduating the feature to beta. I think I had asked SIG Instrumentation about the metrics stability lifecycle and whether it needs to be tied to the feature lifecyle and I believe the answer was "no" - but I don't find anymore where that was.

cc @dgrisonnet @richabanker

This makes sense to me: adoption of metrics collection is probably trailing adoption of the feature itself, so blocking graduation on feedback for metrics is making the chicken-and-egg-problem even worse.

Many of our existing metrics are still alpha, so this wouldn't be unusual. But perhaps that is just a (very?!) common oversight?

This is also how it has been handled in other recent feature

We had some discussions in the past to potentially have metric graduation lag one release behind feature graduation for the exact reason you mentioned. I think we left the discussion there and never added a requirement to graduate metrics.

But having most metrics alpha is definitely a problem and it looks like leaving graduation up to component owners didn't really work. Maybe we should add a new criteria that all metrics should be graduated to beta before a feature can be graduated to GA. Then we could have a job catching metrics that are perma beta similar to what apimachinery has for APIs. wdyt?

This sounds like a reasonable policy.

For DRA, I think we could graduate the existing metrics to beta in 1.34. The new one I'd like to keep in alpha and then graduate later, which is not quite according to that proposed policy, but it's just a proposal at this time, right? 😅

Either way, I'll create tracking issues for the metrics so that we don't forget.

pohly · 2025-05-23T07:00:03Z

keps/sig-node/4381-dra-structured-parameters/README.md

+- Metric name: `dra_resource_claims_in_use`
+- Description: The number of ResourceClaims that are currently in use on the node, by driver name (`driver_name` label value) and across all drivers (special value `<any>` for `driver_name`). Note that the sum of all by-driver counts is not the total number of in-use ResourceClaims because the same ResourceClaim might use devices from different drivers. Instead, use the count for the `<any>` driver_name.
+- Type: Histogram
+- Labels: `driver_name`


The implementation is ready, see kubernetes/kubernetes#131641. I'm holding it for review of the KEP update.

The design and implementation of this metrics was reviewed by @richabanker.

It starts out as alpha, which is consistent with the other metrics (see above).

pohly · 2025-05-23T07:03:28Z

keps/sig-node/4381-dra-structured-parameters/README.md

@@ -2706,7 +2565,7 @@ calls happens inside the E2E test.

 All tests that don't involve actually running a Pod can become part of
 conformance testing. Those tests that run Pods cannot be because CDI support in
-runtimes is not required.
+runtimes and plugin support in the kubelet are not required for conformance.


For reference: #sig-node discussion around standardizing plugin support

pohly · 2025-05-23T07:06:27Z

keps/sig-node/4381-dra-structured-parameters/README.md

+    - When zero, remove the driver.
+
+  - Testing: An E2E test covers the expected retry mechanism in kubelet when
+    `NodeUnprepareResources` fails intermittently.


I think a reference to a separate feature is useful here and does not make that feature a blocker for GA because it just improves how admins deal with this, it's not required.

pohly · 2025-05-23T07:09:04Z

keps/sig-node/4381-dra-structured-parameters/kep.yaml

@kubernetes/sig-scheduling-leads: who is going to take over as reviewer and approver?

Should @MaciekPytel still be listed for SIG Autoscaling?

mortent · 2025-05-28T02:43:53Z

keps/sig-node/4381-dra-structured-parameters/README.md

+1.32. Only the `v1` version is served by default. `v1beta1` must remain
+supported for encoding/decoding. Other betas remain available
+as long as required by the [deprecation policy](https://kubernetes.io/docs/reference/using-api/deprecation-policy/)
+but need to be enabled explicitly.


What blocks us from removing the v1beta1 API once we have switched the serialization version to v1beta2 (ref kubernetes/kubernetes#129889)?

mortent · 2025-05-28T02:48:53Z

keps/sig-node/4381-dra-structured-parameters/README.md

+depends on the number of requests per ResourceClaim, number of ResourceClaims,
+number of published devices in ResourceSlices, and the complexity of the
+requests. Other checks besides CEL evaluation also take time (usage checks,
+match attributes, etc.).


I also do think we need the timeout. As described in kubernetes/kubernetes#131730, there are certain situations where the allocation can end up taking a very long time, which would block scheduling of all pods. The situations where this happens probably aren't common, but also not completely crazy.

mortent · 2025-05-28T02:50:40Z

keps/sig-node/4381-dra-structured-parameters/README.md

-DRA drivers should implement both because support for v1alpha4 might get
-removed.
+Versions v1beta1 and v1 are supported by kubelet. Both are identical.
+DRA drivers should implement both because support for v1beta will get


Should this be v1beta1 (instead of just v1beta)?

Some of the existing content was a bit stale, for example the v1beta2 API changes were missing. Seamless upgrades were already added in 1.33. New for 1.34 are the dra_resource_claims_in_use metrics and the Filter timeout. They are part of filling gaps identified for GA. With those gaps closed, the criteria for GA should be satisfied, so promotion to GA gets proposed for 1.34.

pohly · 2025-05-30T09:19:23Z

keps/sig-node/4381-dra-structured-parameters/README.md

-When started with DRA enabled, the scheduler should check whether DRA is also
-enabled in the API server. Without such an explicit check, syncing the informer
-caches would fail when the feature gate is enabled but the API group is
-disabled. How to implement such a check reliably still needs to be determined.


We decided against implementing this. Instead the usual policy applies: admins must ensure that a feature is enabled in the apiserver before enabling it elsewhere. This is not specific to this KEP and thus not documented explicitly.

mrunalp · 2025-05-31T16:27:38Z

/milestone 1.34
/label lead-opted-in

k8s-ci-robot · 2025-05-31T16:27:40Z

@mrunalp: The provided milestone is not valid for this repository. Milestones in this repository: [v1.25, v1.27, v1.28, v1.29, v1.30, v1.31, v1.32, v1.33, v1.34, v1.35]

Use /milestone clear to clear the milestone.

In response to this:

/milestone 1.34
/label lead-opted-in

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

mrunalp · 2025-05-31T16:27:57Z

/milestone v1.34

pohly · 2025-06-03T10:29:22Z

/wg device-management

macsko · 2025-06-03T11:44:46Z

keps/sig-node/4381-dra-structured-parameters/README.md

+Therefore the scheduler plugin supports a configurable timeout that is
+applied to the entire Filter call for each node. In case of a timeout,
+Filter returns Unschedulable. If Filter succeeds for some other node(s),
+scheduling continues with those. If Filter fails for all of them,
+the Pod is placed in the unschedulable queue. It will get checked again
+if changes in e.g. ResourceSlices or ResourceClaims indicate that
+another scheduling attempt might succeed. If this fails repeatedly,
+exponential backoff slows down future attempts.


Are there corner cases that would cause a pod that is schedulable on some node, being constantly rejected because of a timeout? Let's say filter on some node X takes always much time and eventually pod can be scheduled there, but it won't because of timeout. If that could be a case, can you say it explicitly in the KEP?

And probably it's worth to add to the KEP a sentence saying that if neither ResourceSlices nor ResourceClaims changes, the pod won't be retried.

Are there corner cases that would cause a pod that is schedulable on some node, being constantly rejected because of a timeout?

Yes, that's the risk here. The default timeout was chosen very high to make this unlikely, but it could still happen. I'll add some more words around this.

if neither ResourceSlices nor ResourceClaims changes, the pod won't be retried.

Will add.

macsko · 2025-06-03T11:47:22Z

keps/sig-node/4381-dra-structured-parameters/README.md

+depends on the number of requests per ResourceClaim, number of ResourceClaims,
+number of published devices in ResourceSlices, and the complexity of the
+requests. Other checks besides CEL evaluation also take time (usage checks,
+match attributes, etc.).


Timeout seems like a good approach to me.

If this timeout would be able to cause some pods being unschedulable, we probably should have a feature gate. However, I don't have a strong opinion on it.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 23, 2025

k8s-ci-robot requested review from dchen1107 and mrunalp May 23, 2025 06:30

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 23, 2025

pohly commented May 23, 2025

View reviewed changes

mortent reviewed May 28, 2025

View reviewed changes

pohly mentioned this pull request May 28, 2025

DRA: structured parameters #4381

Open

27 tasks

pohly force-pushed the dra-structured-parameters-ga branch from 8f98b4b to 90f13dd Compare May 30, 2025 09:17

pohly commented May 30, 2025

View reviewed changes

pohly mentioned this pull request May 30, 2025

WIP: DRA scheduler: implement filter timeout kubernetes/kubernetes#132033

Open

k8s-ci-robot added this to the v1.34 milestone May 31, 2025

k8s-ci-robot added the lead-opted-in Denotes that an issue has been opted in to a release label May 31, 2025

k8s-ci-robot added the wg/device-management Categorizes an issue or PR as relevant to WG Device Management. label Jun 3, 2025

github-project-automation bot added this to Dynamic Resource Allocation Jun 3, 2025

github-project-automation bot moved this to 🆕 New in Dynamic Resource Allocation Jun 3, 2025

pohly moved this from 🆕 New to 👀 In review in Dynamic Resource Allocation Jun 3, 2025

macsko reviewed Jun 3, 2025

View reviewed changes

KEP 4381: DRA structured parameters: updates, promotion to GA #5333

Are you sure you want to change the base?

KEP 4381: DRA structured parameters: updates, promotion to GA #5333

Conversation

pohly commented May 23, 2025

Uh oh!

k8s-ci-robot commented May 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pohly May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pohly May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrunalp commented May 31, 2025

Uh oh!

k8s-ci-robot commented May 31, 2025

Uh oh!

mrunalp commented May 31, 2025

Uh oh!

pohly commented Jun 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pohly May 30, 2025 •

edited

Loading

pohly May 30, 2025 •

edited

Loading