[Fix] Prevent remote wl creation with stale ClusterName by mszadkow · Pull Request #11378 · kubernetes-sigs/kueue

mszadkow · 2026-05-21T08:33:28Z

What type of PR is this?

/kind bug
/area multikueue

What this PR does / why we need it:

Fixed a race condition in the AllAtOnce MultiKueue dispatcher where a stale informer cache could cause remote workloads to be created before nomination was confirmed in etcd, leading to an infinite webhook rejection loop that prevented re-admission after worker eviction.

Which issue(s) this PR fixes:

Fixes #11062

Special notes for your reviewer:

Does this PR introduce a user-facing change?

MultiKueue: Fixed a bug in the AllAtOnce dispatcher where workloads evicted from a
worker cluster could fail to be re-admitted. Kueue now waits for the ongoing eviction to
complete before starting a new nomination and re-admission cycle.

netlify · 2026-05-21T08:33:34Z

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

Name	Link
🔨 Latest commit	`bb4cbb4`
🔍 Latest deploy log	https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/6a0ed04915b313000807ad0d

mszadkow · 2026-05-21T08:34:34Z

/cc @vladikkuzn @reruno

mimowo · 2026-05-21T10:07:16Z

 		}
-		if group.local.Status.ClusterName == nil && !equality.Semantic.DeepEqual(group.local.Status.NominatedClusterNames, nominatedWorkers) {
+
+		if !equality.Semantic.DeepEqual(group.local.Status.NominatedClusterNames, nominatedWorkers) {


I have some questions:
0. could you describe what are the state transitions for the happy vs. unhappy test executions so that we can better understand the scenario (that can help to understand the follow up questions)

Is this fix only needed for the AllAtOnce dispatcher, or the issue exists for all of them? I'm basically wondering if the fix should be inside the "if" for the dispatcher type, or more generic, say before that detecting "Eviction ongoing".

Is this state safe-healing by follow up requests, or the system gets stuck in a wrong state? It seems very fragile that proceeding with the patch request messes up the system state. I would expect the ongoing eviction can fix the system state by resetting status.ClusterName and status.NominatedClusterNames to null

I left the analysis in the issue - #11062 (comment)
Just copying it here too so we can talk about it.

1. Stale cache: ClusterName = "worker1", NominatedClusterNames = nil 2. Nomination guard skipped - worker2 created unconditionally 3. Worker2 admitted → syncReservingRemoteState tries SSA: ClusterName = "worker2", NominatedClusterNames = nil 4. Webhook: oldObj.NominatedClusterNames = nil → doesn't contain "worker2" → rejected forever

The issue is bounded to AllAtOnce and that's because it's coupled with the controller.
The controller both decides on nomination and immediately uses it to create remote workloads.
If a controller only updates NominatedClusterNames and does not create remote workloads in the same reconcile, then the stale ClusterName race cannot directly produce the bad side effect.

The worst outcome could be that the nomination update is skipped or delayed because the cache still shows ClusterName != nil.
That is what the incremental dispatcher does, it bails out when ClusterName is set, and its only status write is the nomination patch in the later nomination path.

This is designed to self-heal and not get stuck.
Eviction/reservation eventually clears ClusterName and NominatedClusterNames, and reconcile waits for informer cache to reflect that before patching nominations or creating remote workloads.
The risky window was stale cache.
If you patch while ClusterName still appears set, you can act on old state and create wrong remotes.
The early return prevents that until the next reconcile after cache catch-up.

For Incremental or External nomination-only controllers, ClusterName != nil is already a conservative stop condition.

That means Moving AllAtOnce to separate controller should be safer in those terms.

I'm not sure I follow the steps or the meaning of the "stale" here, but I suppose there is something to it, that the AllAtOnce dispatcher may be going directly for workload selection before the eviction elapses.

The eviction is triggered by setting status.admissionChecks.state=Retry, and it makes the workload controller to set status.clusterName=nil and status.nominatedClusterName=nil. Only at this point it is safe to start another round of nomination.

So the fix should be ok by making sure that status.clusterName=nil before proceeding, but this is pretty indirect verification. A more direct approach would be to check if status.admissionChecks.state is Pending (no longer Retry), as the incremental dispatcher is doing: https://github.com/kubernetes-sigs/kueue/blob/main/pkg/controller/workloaddispatcher/incrementaldispatcher.go#L103-L106

Moreover, I think we should check this condition for safety independently of the dispatcher, and just defer the nomination phase.

So, I think the PR is fixing the issue, but in a bit indirect way, I'm running the test 200 times in a loop to confirm:

without the fix as "control": WIP: Experiment1 for https://github.com/kubernetes-sigs/kueue/pull/11378 #11401

with the fix as "test": WIP: Experiment2 for https://github.com/kubernetes-sigs/kueue/pull/11378 #11402

I also suspect sometimes we may be getting unnecessary updates to the list of nominatedClusternames from the AllAtOnce dispatcher due to map-based ordering of the clusters. This is probably another issue, but also testing here: #11403

I'm pretty much ok with this fix as is. It will be refactored anyway as part of #10937

Ok, I couldn't repro the flake despire 200 attempts, but I think the analysis makes sense. Certainly the check is harmless as we shouldn't run nomination while a clusterName is already assigned.

We could just work on a more generic solution, but I think we can leave it for a follow up.

Opened: #11452

mimowo · 2026-05-21T10:07:34Z

cc @olekzabl ptal

mimowo · 2026-05-22T08:00:06Z

Thank you for fixing the flake, and a user-facing issue at the same time. I still believe there exists a more generic solution by just skipping nomination for all dispatchers, but let's consider this a follow up cleanup. I will open an issue.

/lgtm
/approve
/cherrypick release-0.17
/cherrypick release-0.16

k8s-infra-cherrypick-robot · 2026-05-22T08:00:11Z

@mimowo: once the present PR merges, I will cherry-pick it on top of release-0.16, release-0.17 in new PRs and assign them to you.

Details

In response to this:

Thank you for fixing the flake, and a user-facing issue at the same time. I still believe there exists a more generic solution by just skipping nomination for all dispatchers, but let's consider this a follow up cleanup. I will open an issue.

/lgtm
/approve
/cherrypick release-0.17
/cherrypick release-0.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2026-05-22T08:00:17Z

LGTM label has been added.

Details

Git tree hash: 19f95325e933e053c7c4f54515def611fab2c36b

k8s-ci-robot · 2026-05-22T08:00:19Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimowo, mszadkow

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [mimowo]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-infra-cherrypick-robot · 2026-05-22T08:43:38Z

@mimowo: new pull request created: #11461

Details

In response to this:

Thank you for fixing the flake, and a user-facing issue at the same time. I still believe there exists a more generic solution by just skipping nomination for all dispatchers, but let's consider this a follow up cleanup. I will open an issue.

/lgtm
/approve
/cherrypick release-0.17
/cherrypick release-0.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-infra-cherrypick-robot · 2026-05-22T08:44:15Z

@mimowo: new pull request created: #11462

Details

In response to this:

Thank you for fixing the flake, and a user-facing issue at the same time. I still believe there exists a more generic solution by just skipping nomination for all dispatchers, but let's consider this a follow up cleanup. I will open an issue.

/lgtm
/approve
/cherrypick release-0.17
/cherrypick release-0.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

mimowo · 2026-05-22T08:49:23Z

/release-note-edit

MultiKueue: Fixed a bug in the AllAtOnce dispatcher where workloads evicted from a
worker cluster could fail to be re-admitted. Kueue now waits for the ongoing eviction to
complete before starting a new nomination and re-admission cycle.

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. area/multikueue Issues or PRs related to MultiKueue labels May 21, 2026

k8s-ci-robot requested review from olekzabl and sohankunkerkar May 21, 2026 08:33

k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 21, 2026

k8s-ci-robot requested review from reruno and vladikkuzn May 21, 2026 08:34

mszadkow force-pushed the flake/11062-mk-preemption-stuck branch from e9e5bf8 to 39cc9a5 Compare May 21, 2026 08:42

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 21, 2026

Prevent remote wl creation with stale ClusterName

bb4cbb4

mszadkow force-pushed the flake/11062-mk-preemption-stuck branch from 39cc9a5 to bb4cbb4 Compare May 21, 2026 09:28

mimowo reviewed May 21, 2026

View reviewed changes

k8s-ci-robot assigned mimowo May 22, 2026

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 22, 2026

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 22, 2026

mimowo mentioned this pull request May 22, 2026

MultiKueue: generalize skipping cluster nomination for all dispatchers while eviction is ongoing #11452

Open

k8s-ci-robot merged commit 7cb7df0 into kubernetes-sigs:main May 22, 2026
39 checks passed

k8s-ci-robot added this to the v0.18 milestone May 22, 2026

k8s-infra-cherrypick-robot mentioned this pull request May 22, 2026

[release-0.17] [Fix] Prevent remote wl creation with stale ClusterName #11461

Closed

k8s-infra-cherrypick-robot mentioned this pull request May 22, 2026

[release-0.16] [Fix] Prevent remote wl creation with stale ClusterName #11462

Closed

This was referenced May 22, 2026

Automated cherry pick of #11378: [Fix] Prevent remote wl creation with stale ClusterName #11472

Merged

Automated cherry pick of #11378: [Fix] Prevent remote wl creation with stale ClusterName #11473

Merged

Conversation

mszadkow commented May 21, 2026 • edited by k8s-ci-robot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Uh oh!

netlify Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

Uh oh!

mszadkow commented May 21, 2026

Uh oh!

mimowo May 21, 2026

Choose a reason for hiding this comment

Uh oh!

mszadkow May 21, 2026

Choose a reason for hiding this comment

Uh oh!

mszadkow May 21, 2026

Choose a reason for hiding this comment

Uh oh!

mimowo May 21, 2026

Choose a reason for hiding this comment

Uh oh!

mimowo May 22, 2026

Choose a reason for hiding this comment

Uh oh!

mimowo May 22, 2026

Choose a reason for hiding this comment

Uh oh!

mimowo commented May 21, 2026

Uh oh!

mimowo commented May 22, 2026

Uh oh!

k8s-infra-cherrypick-robot commented May 22, 2026

Uh oh!

k8s-ci-robot commented May 22, 2026

Uh oh!

k8s-ci-robot commented May 22, 2026

Uh oh!

Uh oh!

k8s-infra-cherrypick-robot commented May 22, 2026

Uh oh!

k8s-infra-cherrypick-robot commented May 22, 2026

Uh oh!

mimowo commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mszadkow commented May 21, 2026 •

edited by k8s-ci-robot

Loading

netlify Bot commented May 21, 2026 •

edited

Loading