Add demo for DRA Resource Availability Visibility (KEP-5677)#206
Conversation
|
Welcome @nmn3m! |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: nmn3m The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
|
||
| This demonstration shows the end-to-end flow of the DRA AdminAccess feature. In a production environment, drivers could use this admin access indication to provide additional privileged capabilities or information to authorized workloads. | ||
|
|
||
| ### Demo DRA Resource Availability Visibility Feature |
There was a problem hiding this comment.
Overall it seems like there's a lot of overlap between the content here and the comment in the example manifest. Do you think we could consolidate to only the example manifest? I actually don't see anything critical here that isn't already mentioned in the example manifest, so maybe we can simply delete this?
If separate docs still make sense, could we instead make a directory under demo/ and add this to a new README.md there? That directory would contain the YAML manifests then too.
|
|
||
| This is particularly useful for non-admin users: `ResourceClaim`s are namespaced, so a user cannot ordinarily inspect claims in other namespaces. A cluster-scoped `ResourcePoolStatusRequest` lets them see aggregate consumption without that visibility. | ||
|
|
||
| #### Usage Example |
There was a problem hiding this comment.
Could we add some lightweight verification to the e2e tests? We don't need to reimplement the k/k tests, just enough to make sure that the example isn't outright broken, like if the driver name is misspelled or anything.
| This demonstration shows the end-to-end flow of the DRA AdminAccess feature. In a production environment, drivers could use this admin access indication to provide additional privileged capabilities or information to authorized workloads. | ||
|
|
||
| ### Demo DRA Resource Availability Visibility Feature | ||
| This example driver works with the [DRA Resource Availability Visibility feature](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5677-dra-resource-availability-visibility) ([KEP-5677](https://github.com/kubernetes/enhancements/issues/5677)), which adds a `ResourcePoolStatusRequest` API for querying how many devices in each pool are allocated vs. available. The aggregation is computed by `kube-controller-manager` from the existing ResourceSlices and ResourceClaims — the driver itself needs no code changes. The feature reached Alpha in Kubernetes 1.36; the demo `kind` cluster created above already enables the `DRAResourcePoolStatus` feature gate and serves the `resource.k8s.io/v1alpha3` API. |
There was a problem hiding this comment.
Could we split this raw markdown into separate lines? https://kubernetes.io/docs/contribute/style/style-guide/#line-breaks
| This demonstration shows the end-to-end flow of the DRA AdminAccess feature. In a production environment, drivers could use this admin access indication to provide additional privileged capabilities or information to authorized workloads. | ||
|
|
||
| ### Demo DRA Resource Availability Visibility Feature | ||
| This example driver works with the [DRA Resource Availability Visibility feature](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5677-dra-resource-availability-visibility) ([KEP-5677](https://github.com/kubernetes/enhancements/issues/5677)), which adds a `ResourcePoolStatusRequest` API for querying how many devices in each pool are allocated vs. available. The aggregation is computed by `kube-controller-manager` from the existing ResourceSlices and ResourceClaims — the driver itself needs no code changes. The feature reached Alpha in Kubernetes 1.36; the demo `kind` cluster created above already enables the `DRAResourcePoolStatus` feature gate and serves the `resource.k8s.io/v1alpha3` API. |
There was a problem hiding this comment.
This example driver works with the DRA Resource Availability Visibility feature (KEP-5677)
Could we lead with a link to the feature docs instead of the KEP? https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#resource-pool-status
A "for more information" link to the KEP is probably still useful, but the user-facing docs are probably a better starting point.
|
|
||
| #### Usage Example | ||
|
|
||
| Start by deploying one of the `basic-*` demos from earlier in this guide so that there is something to consume GPUs. For example, `demo/basic-resourceclaimtemplate.yaml` allocates 2 GPUs from the worker's pool of 8. |
There was a problem hiding this comment.
Ideally this shouldn't depend on any specific combination of other examples already deployed in the cluster, so if a Pod+ResourceClaim consuming some resources is useful to illustrate the feature then let's add one dedicated to this example.
…ith new ResourcePoolStatusRequest API Signed-off-by: Nour <nurmn3m@gmail.com>
79f027d to
32a9a8e
Compare
Summary
KEP-5677 introduced a new
ResourcePoolStatusRequestAPI in Kubernetes 1.36 (alpha) that lets users query how many devices in each DRA pool are allocated vs. available, without needing read access to every namespace'sResourceClaims. The aggregation runs data, so the example driver itself needs no code changes — only a demo.Fixes #187