Skip to content

Add demo for DRA Resource Availability Visibility (KEP-5677)#206

Open
nmn3m wants to merge 1 commit into
kubernetes-sigs:mainfrom
nmn3m:resource-pool-status-demo
Open

Add demo for DRA Resource Availability Visibility (KEP-5677)#206
nmn3m wants to merge 1 commit into
kubernetes-sigs:mainfrom
nmn3m:resource-pool-status-demo

Conversation

@nmn3m
Copy link
Copy Markdown

@nmn3m nmn3m commented May 17, 2026

Summary

KEP-5677 introduced a new ResourcePoolStatusRequest API in Kubernetes 1.36 (alpha) that lets users query how many devices in each DRA pool are allocated vs. available, without needing read access to every namespace's ResourceClaims. The aggregation runs data, so the example driver itself needs no code changes — only a demo.

Fixes #187

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 17, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Welcome @nmn3m!

It looks like this is your first PR to kubernetes-sigs/dra-example-driver 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/dra-example-driver has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot requested a review from klueska May 17, 2026 16:06
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nmn3m
Once this PR has been reviewed and has the lgtm label, please assign nojnhuh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from nojnhuh May 17, 2026 16:06
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 17, 2026
@pohly pohly moved this from 🆕 New to 👀 In review in Dynamic Resource Allocation May 18, 2026
Comment thread README.md Outdated

This demonstration shows the end-to-end flow of the DRA AdminAccess feature. In a production environment, drivers could use this admin access indication to provide additional privileged capabilities or information to authorized workloads.

### Demo DRA Resource Availability Visibility Feature
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it seems like there's a lot of overlap between the content here and the comment in the example manifest. Do you think we could consolidate to only the example manifest? I actually don't see anything critical here that isn't already mentioned in the example manifest, so maybe we can simply delete this?

If separate docs still make sense, could we instead make a directory under demo/ and add this to a new README.md there? That directory would contain the YAML manifests then too.

Comment thread README.md Outdated

This is particularly useful for non-admin users: `ResourceClaim`s are namespaced, so a user cannot ordinarily inspect claims in other namespaces. A cluster-scoped `ResourcePoolStatusRequest` lets them see aggregate consumption without that visibility.

#### Usage Example
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add some lightweight verification to the e2e tests? We don't need to reimplement the k/k tests, just enough to make sure that the example isn't outright broken, like if the driver name is misspelled or anything.

Comment thread README.md Outdated
This demonstration shows the end-to-end flow of the DRA AdminAccess feature. In a production environment, drivers could use this admin access indication to provide additional privileged capabilities or information to authorized workloads.

### Demo DRA Resource Availability Visibility Feature
This example driver works with the [DRA Resource Availability Visibility feature](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5677-dra-resource-availability-visibility) ([KEP-5677](https://github.com/kubernetes/enhancements/issues/5677)), which adds a `ResourcePoolStatusRequest` API for querying how many devices in each pool are allocated vs. available. The aggregation is computed by `kube-controller-manager` from the existing ResourceSlices and ResourceClaims — the driver itself needs no code changes. The feature reached Alpha in Kubernetes 1.36; the demo `kind` cluster created above already enables the `DRAResourcePoolStatus` feature gate and serves the `resource.k8s.io/v1alpha3` API.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we split this raw markdown into separate lines? https://kubernetes.io/docs/contribute/style/style-guide/#line-breaks

Comment thread README.md Outdated
This demonstration shows the end-to-end flow of the DRA AdminAccess feature. In a production environment, drivers could use this admin access indication to provide additional privileged capabilities or information to authorized workloads.

### Demo DRA Resource Availability Visibility Feature
This example driver works with the [DRA Resource Availability Visibility feature](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5677-dra-resource-availability-visibility) ([KEP-5677](https://github.com/kubernetes/enhancements/issues/5677)), which adds a `ResourcePoolStatusRequest` API for querying how many devices in each pool are allocated vs. available. The aggregation is computed by `kube-controller-manager` from the existing ResourceSlices and ResourceClaims — the driver itself needs no code changes. The feature reached Alpha in Kubernetes 1.36; the demo `kind` cluster created above already enables the `DRAResourcePoolStatus` feature gate and serves the `resource.k8s.io/v1alpha3` API.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example driver works with the DRA Resource Availability Visibility feature (KEP-5677)

Could we lead with a link to the feature docs instead of the KEP? https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#resource-pool-status

A "for more information" link to the KEP is probably still useful, but the user-facing docs are probably a better starting point.

Comment thread README.md Outdated

#### Usage Example

Start by deploying one of the `basic-*` demos from earlier in this guide so that there is something to consume GPUs. For example, `demo/basic-resourceclaimtemplate.yaml` allocates 2 GPUs from the worker's pool of 8.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this shouldn't depend on any specific combination of other examples already deployed in the cluster, so if a Pod+ResourceClaim consuming some resources is useful to illustrate the feature then let's add one dedicated to this example.

…ith new ResourcePoolStatusRequest API

Signed-off-by: Nour <nurmn3m@gmail.com>
@nmn3m nmn3m force-pushed the resource-pool-status-demo branch from 79f027d to 32a9a8e Compare May 20, 2026 21:15
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 20, 2026
@nmn3m nmn3m requested a review from nojnhuh May 20, 2026 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add example for resource availability visibility (KEP-5677)

5 participants