Skip to content

feat: add configurable binding conditions support#199

Open
ttsuuubasa wants to merge 3 commits into
kubernetes-sigs:mainfrom
ttsuuubasa:dra-binding-conditions
Open

feat: add configurable binding conditions support#199
ttsuuubasa wants to merge 3 commits into
kubernetes-sigs:mainfrom
ttsuuubasa:dra-binding-conditions

Conversation

@ttsuuubasa
Copy link
Copy Markdown

@ttsuuubasa ttsuuubasa commented May 12, 2026

Summary

Add configurable Binding Conditions support to the DRA example driver, allowing the scheduler to defer final device binding decisions until the driver confirms device readiness on the node.

Motivation

The DRA example driver serves as a reference implementation for developers building their own DRA drivers. By demonstrating how to implement Binding Conditions — including the required ResourceSlice fields and end-to-end demo scripts — this PR helps driver developers understand the expected behavior and integration patterns for this feature introduced in Kubernetes 1.34.

Changes

Kubelet Plugin

  • Feature flag: Add --binding-conditions CLI flag (env: BINDING_CONDITIONS, default: false).
  • GPU Profile: When enabled, EnumerateDevices() sets BindingConditions and BindingFailureConditions on each device in the ResourceSlice.
  • Helm chart: Add kubeletPlugin.bindingConditions: false to values.yaml and wire it to the kubelet plugin container env in kubeletplugin.yaml.
  • Demo manifest: binding-conditions.yaml — sample manifest demonstrating the workflow.
  • Kind cluster config: Enable DRADeviceBindingConditions feature gate.

Controller & E2E Tests

  • dra-example-controller: New controller binary using controller-runtime with a plugin architecture.
    • plugins/bindingconditions.go: Watches allocated ResourceClaim objects and automatically satisfies binding conditions by setting the device condition to True.
    • controller.go: Generic ClaimReconciler that dispatches to registered plugins.
    • main.go: Plugin registry, flag parsing (--enable-plugin), manager setup.
  • Helm templates: controller-deployment.yaml, updated clusterrole.yaml with RBAC for ResourceClaims, new values.yaml fields (controller.plugins).
  • Dockerfile: Add dra-example-controller binary to the container image.
  • E2E tests: Gated by BINDING_CONDITIONS=true; verify ResourceSlice fields and pod readiness lifecycle.
  • Demo documentation: README.md with step-by-step walkthrough.

Testing

# E2E with binding conditions
BINDING_CONDITIONS=true make setup-e2e
BINDING_CONDITIONS=true make test-e2e

# Or run only binding conditions tests
BINDING_CONDITIONS=true go run github.com/onsi/ginkgo/v2/ginkgo --tags=e2e --focus="BindingConditions" ./test/e2e/...

Requirements

  • Kubernetes 1.34+
  • Feature gate: DRADeviceBindingConditions=true

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 12, 2026
@k8s-ci-robot k8s-ci-robot requested a review from byako May 12, 2026 09:27
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ttsuuubasa
Once this PR has been reviewed and has the lgtm label, please assign nojnhuh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from klueska May 12, 2026 09:27
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 12, 2026
@pohly pohly moved this from 🆕 New to 👀 In review in Dynamic Resource Allocation May 12, 2026
Comment thread README.md Outdated
...
```

4. **Set the binding condition**: In a real driver, an external controller or other component would update the `ResourceClaim` status to signal that the device is ready. In this demo, simulate that by editing the status directly. First get the current timestamp in UTC:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example driver should be applying the condition itself instead of relying on users to do this. This better reflects how the feature is implemented and used with real drivers. The example driver doesn't have to wait for anything interesting before applying the condition, but we should include a comment saying where a real driver would wait for some real signal before adding the condition.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the nature of the BindingConditions feature, I believe that we need to implement a controller on the example driver side to handle this. Is my understanding correct?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the example driver doesn't run any controller currently. Ideally we set it up to be fairly generic to accommodate other features too, e.g. #71.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nojnhuh
I have added an implementation of a controller for setting BindingConditions. To ensure generality, each piece of logic is designed to be implemented as a plugin, allowing the corresponding processing to be executed by implementing its own plugin.

Comment thread README.md Outdated

This demonstration shows the end-to-end flow of the DRA AdminAccess feature. In a production environment, drivers could use this admin access indication to provide additional privileged capabilities or information to authorized workloads.

### Demo DRA Device Binding Conditions Feature
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we start putting feature-specific docs in their own directory? I think this would fit somewhere like demo/binding-conditions/README.md.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m planning to create a new directory and move the content currently documented in the README there.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nojnhuh
The feature-specific docs for the Binding Conditions demo have been moved under demo/binding-conditions.

Comment thread demo/test-binding-conditions.sh Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a script like this, could we add validation to the e2e tests? If we update the driver to add the condition, I think all an e2e test would need to check is that the binding conditions are added to the devices in the ResourceSlices and that the Pods still become Ready and Running. We can assume that Kubernetes is behaving correctly otherwise.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m trying to implement this in the e2e tests instead of using the current script.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added an e2e test that verifies whether binding conditions are set on the ResourceSlice and that the Pod reaches the Running state after the controller sets the binding conditions.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 14, 2026
Add --enable-binding-conditions flag and GPU profile support for
binding conditions processing in DRA driver.

Signed-off-by: Tsubasa Watanabe <w.tsubasa@fujitsu.com>
@ttsuuubasa ttsuuubasa force-pushed the dra-binding-conditions branch from 9c9be6b to eb2e2d9 Compare May 15, 2026 07:27
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 15, 2026
- Add dra-example-controller that automatically satisfies device binding
  conditions on allocated ResourceClaims using controller-runtime
- Add Helm templates, Dockerfile, and RBAC for the controller deployment
- Add e2e tests gated by BINDING_CONDITIONS=true to verify ResourceSlice
  fields and pod readiness

Signed-off-by: Tsubasa Watanabe <w.tsubasa@fujitsu.com>
@ttsuuubasa ttsuuubasa force-pushed the dra-binding-conditions branch from eb2e2d9 to 6d7d13d Compare May 15, 2026 07:40
- Change from sequential to parallel plugin execution using goroutines
- Collect all plugin errors instead of failing on first error

Signed-off-by: Tsubasa Watanabe <w.tsubasa@fujitsu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants