-
Notifications
You must be signed in to change notification settings - Fork 622
[Fix] Prevent remote wl creation with stale ClusterName #11378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
k8s-ci-robot
merged 1 commit into
kubernetes-sigs:main
from
epam:flake/11062-mk-preemption-stuck
May 22, 2026
+17
−1
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some questions:
0. could you describe what are the state transitions for the happy vs. unhappy test executions so that we can better understand the scenario (that can help to understand the follow up questions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left the analysis in the issue - #11062 (comment)
Just copying it here too so we can talk about it.
AllAtOnceand that's because it's coupled with the controller.The controller both decides on nomination and immediately uses it to create remote workloads.
If a controller only updates
NominatedClusterNamesand does not create remote workloads in the same reconcile, then the stale ClusterName race cannot directly produce the bad side effect.The worst outcome could be that the nomination update is skipped or delayed because the cache still shows ClusterName != nil.
That is what the incremental dispatcher does, it bails out when ClusterName is set, and its only status write is the nomination patch in the later nomination path.
Eviction/reservation eventually clears
ClusterNameandNominatedClusterNames, and reconcile waits for informer cache to reflect that before patching nominations or creating remote workloads.The risky window was stale cache.
If you patch while
ClusterNamestill appears set, you can act on old state and create wrong remotes.The early return prevents that until the next reconcile after cache catch-up.
For Incremental or External nomination-only controllers,
ClusterName != nilis already a conservative stop condition.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means Moving AllAtOnce to separate controller should be safer in those terms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow the steps or the meaning of the "stale" here, but I suppose there is something to it, that the AllAtOnce dispatcher may be going directly for workload selection before the eviction elapses.
The eviction is triggered by setting
status.admissionChecks.state=Retry, and it makes the workload controller to setstatus.clusterName=nilandstatus.nominatedClusterName=nil. Only at this point it is safe to start another round of nomination.So the fix should be ok by making sure that
status.clusterName=nilbefore proceeding, but this is pretty indirect verification. A more direct approach would be to check ifstatus.admissionChecks.stateis Pending (no longer Retry), as the incremental dispatcher is doing: https://github.com/kubernetes-sigs/kueue/blob/main/pkg/controller/workloaddispatcher/incrementaldispatcher.go#L103-L106Moreover, I think we should check this condition for safety independently of the dispatcher, and just defer the nomination phase.
So, I think the PR is fixing the issue, but in a bit indirect way, I'm running the test 200 times in a loop to confirm:
I also suspect sometimes we may be getting unnecessary updates to the list of nominatedClusternames from the AllAtOnce dispatcher due to map-based ordering of the clusters. This is probably another issue, but also testing here: #11403
I'm pretty much ok with this fix as is. It will be refactored anyway as part of #10937
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I couldn't repro the flake despire 200 attempts, but I think the analysis makes sense. Certainly the check is harmless as we shouldn't run nomination while a clusterName is already assigned.
We could just work on a more generic solution, but I think we can leave it for a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened: #11452