Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
d17aa9a
KEP: Introduce stepSize for MultiKueue Incremental Dispatcher (#9270)
Mostafahassen1 May 1, 2026
00933d0
rename customconfigs to sequential tests and split taxonomy
Mostafahassen1 May 1, 2026
d8312bf
Update keps/9270-multikueue-incremental-step-size/README.md
Mostafahassen1 May 5, 2026
f384da4
Update keps/9270-multikueue-incremental-step-size/README.md
Mostafahassen1 May 5, 2026
9c1bd75
Update keps/9270-multikueue-incremental-step-size/README.md
Mostafahassen1 May 5, 2026
5d015b5
feat(multikueue): add incremental dispatcher step size
Mostafahassen1 May 5, 2026
14b8064
feat(multikueue): add incremental dispatcher step size
Mostafahassen1 May 5, 2026
c371136
feat(multikueue): add incremental dispatcher step size
Mostafahassen1 May 6, 2026
1d05665
feat(multikueue): add incremental dispatcher step size
Mostafahassen1 May 6, 2026
22eb663
Update keps/9270-multikueue-incremental-step-size/kep.yaml
Mostafahassen1 May 6, 2026
2ecc54c
Update keps/9270-multikueue-incremental-step-size/README.md
Mostafahassen1 May 6, 2026
601b212
feat(multikueue): add incremental dispatcher step size
Mostafahassen1 May 7, 2026
038f1eb
update : kep.yml
Mostafahassen1 May 8, 2026
5a299ea
update : kep.yml and readme files
Mostafahassen1 May 8, 2026
98fe7a1
docs: remove KEP template and update feature gate name
Mostafahassen1 May 8, 2026
3224bcc
chore: restore NNNN-template README to clean up PR
Mostafahassen1 May 8, 2026
2564722
fix : format and update Kep.yml
Mostafahassen1 May 8, 2026
022f1c3
fix : format in readme file
Mostafahassen1 May 8, 2026
ece1621
fix : format in readme file
Mostafahassen1 May 8, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
119 changes: 119 additions & 0 deletions keps/9270-multikueue-incremental-step-size/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@

# KEP-9270: MultiKueue Incremental Dispatcher Step Size
<!-- toc -->
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [API Changes](#api-changes)
- [Execution Loop](#execution-loop)
- [Test Plan](#test-plan)
- [Prerequisite testing updates](#prerequisite-testing-updates)
- [Unit tests](#unit-tests)
- [Integration tests](#integration-tests)
- [e2e tests](#e2e-tests)
- [Graduation Criteria](#graduation-criteria)
- [Implementation History](#implementation-history)
- [Drawbacks](#drawbacks)
<!-- /toc -->

## Summary

This proposal introduces a new configuration field called `stepSize` to the MultiKueue Incremental Dispatcher. This setting will control the number of worker clusters that the Manager Cluster will attempt to dispatch a workload to concurrently.

## Motivation

Currently, when MultiKueue attempts to schedule a workload across multiple worker clusters, the Incremental Dispatcher has a limitation. If it checks clusters one by one, the process is too slow and delays job execution. If it attempts to dispatch to all clusters simultaneously, it generates unnecessary API traffic, increases the load on the Manager Cluster, and can lead to race conditions where multiple clusters try to admit the same job at the same time.
Comment thread
Mostafahassen1 marked this conversation as resolved.
Outdated

Introducing a `stepSize` allows cluster administrators to configure a "batch size" for this process, finding the perfect balance between speed and system stability.

### Goals

* Add a `stepSize` parameter to the `MultiKueueConfig` API.
* Update the Workload Controller's dispatching loop to process worker clusters in batches defined by the `stepSize`.
* Ensure that if a workload is admitted in a batch, the remaining pending proxy workloads in that batch are cleanly removed.
Comment thread
Mostafahassen1 marked this conversation as resolved.
Outdated

### Non-Goals

* We will not change the way worker clusters are scored or prioritized.
* We will not alter the `AllAtOnce` dispatcher behavior, as this proposal strictly targets the Incremental Dispatcher.

## Proposal

We propose updating the `MultiKueueConfig` API to accept an optional integer for the step size, and updating the Workload Controller to respect this batch limit when iterating through the list of available worker clusters.


### Risks and Mitigations

**Risk:** An administrator might set the `stepSize` to an extremely high number (e.g., equal to the total number of clusters), which effectively recreates the heavy load issues of the `AllAtOnce` dispatcher and increases the chance of race conditions where multiple clusters admit the job at the exact same time.
**Mitigation:** Provide clear API documentation regarding best practices. The Workload Controller's conflict resolution logic (which deletes duplicate proxy workloads once one is admitted) will naturally handle race conditions, just as it already does for the `AllAtOnce` strategy.
Comment thread
Mostafahassen1 marked this conversation as resolved.
Outdated
Comment thread
Mostafahassen1 marked this conversation as resolved.
Outdated
Comment thread
Mostafahassen1 marked this conversation as resolved.
Outdated

## Design Details

### API Changes
We will update the `MultiKueueConfigSpec` struct to include the new field. This change will be backwards compatible by defaulting the step size to `1` (which matches the current one-by-one behavior).
Comment thread
Mostafahassen1 marked this conversation as resolved.
Outdated

```go
type MultiKueueConfigSpec struct {
// Other existing fields...

// StepSize defines the number of worker clusters the Incremental
// Dispatcher will query at the same time.
// Minimum value is 1. If not set, it defaults to 1.
// +optional
// +kubebuilder:default=1
// +kubebuilder:validation:Minimum=1
StepSize *int32 `json:"stepSize,omitempty"`
Comment thread
Mostafahassen1 marked this conversation as resolved.
Outdated
}
```

### Execution Loop
The core logic change will take place inside the MultiKueue Workload Controller's reconciliation loop:
Comment thread
Mostafahassen1 marked this conversation as resolved.
Outdated
1. When a workload is ready for dispatch, the controller reads the `stepSize` from the `MultiKueueConfig`.
2. The controller takes the full list of available worker clusters and divides them into groups (batches) equal to the `stepSize`.
3. The controller loops through the first batch, creating proxy workloads on those specific worker clusters simultaneously.
4. If a cluster in the batch admits the job, the controller immediately deletes the proxy workloads from the other clusters in that same batch and stops the loop.
5. If no cluster in the batch admits the job, the controller moves on to the next batch of clusters.

### Test Plan

I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.

#### Prerequisite testing updates

No major prerequisite testing updates are required. Current mock worker cluster frameworks in EnvTest are sufficient.

#### Unit tests

- `pkg/controller/core/workload_controller_test.go`: `<date>` - Verify that the worker cluster list is correctly divided into chunks matching the `stepSize`.
- `pkg/controller/core/workload_controller_test.go`: `<date>` - Verify that the default value remains `1` if the user does not specify a `stepSize`.
Comment thread
Mostafahassen1 marked this conversation as resolved.
Outdated

#### Integration tests

- `TestIncrementalDispatcherStepSize`: Simulate a Manager Cluster with 5 Worker Clusters. Set `stepSize` to `2`. Submit a workload and verify via the API server that exactly 2 proxy workloads are created in the first step. Simulate a rejection on the first 2 clusters, and verify that the controller correctly creates the next 2 proxy workloads in the second step.

#### e2e tests

Standard E2E multi-cluster tests will be updated to include an iteration where `MultiKueueConfig` is deployed with a `stepSize > 1` to ensure workloads execute successfully end-to-end without duplication.

### Graduation Criteria

* **Alpha:** * API fields added.
Comment thread
Mostafahassen1 marked this conversation as resolved.
Outdated
* Workload Controller updated to support batching.
* Unit tests and EnvTest integration tests passing.
* **Beta:** * Gather feedback from the community.
Comment thread
Mostafahassen1 marked this conversation as resolved.
Outdated
Comment thread
Mostafahassen1 marked this conversation as resolved.
Outdated
* E2E tests added and passing consistently.
* **Stable:**
* Feature has been in Beta for at least one release cycle with no major bugs reported.

## Implementation History

- 2026-05-01: KEP proposed and initial draft created.

## Drawbacks

This adds slight code complexity to the existing Incremental Dispatcher loop and introduces one additional configuration parameter for administrators to manage and tune.
```
Comment thread
Mostafahassen1 marked this conversation as resolved.
Outdated
35 changes: 35 additions & 0 deletions keps/9270-multikueue-incremental-step-size/kep.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
title: MultiKueue Incremental Dispatcher Step Size
Comment thread
Mostafahassen1 marked this conversation as resolved.
Outdated
kep-number: 9270
authors:
- "@Mostafahassen1"
status: provisional
creation-date: "2026-05-01"
reviewers:
- TBD
Comment thread
Mostafahassen1 marked this conversation as resolved.
Outdated
approvers:
- TBD
Comment thread
Mostafahassen1 marked this conversation as resolved.
Outdated

# The target maturity stage in the current dev cycle for this KEP.
stage: alpha

# The most recent milestone for which work toward delivery of this KEP has been
# done. Kueue typically uses 0.x versioning.
latest-milestone: "v0.9"
Comment thread
Mostafahassen1 marked this conversation as resolved.
Outdated

# The milestone at which this feature was, or is targeted to be, at each stage.
milestone:
alpha: "v0.9"
beta: "v0.10"
stable: "v1.0"
Comment thread
Mostafahassen1 marked this conversation as resolved.
Outdated

# The following PRR answers are required at alpha release
# List the feature gate name and the components for which it must be enabled
feature-gates:
- name: MultiKueueStepSize
Comment thread
Mostafahassen1 marked this conversation as resolved.
Outdated
Comment thread
Mostafahassen1 marked this conversation as resolved.
Outdated
components:
- kueue-controller-manager
disable-supported: true

# The following PRR answers are required at beta release
metrics:
- kueue_multikueue_dispatched_workloads_total
Comment thread
Mostafahassen1 marked this conversation as resolved.
Outdated