Skip to content

Fix finished Workload GC with finalizers#11181

Merged
k8s-ci-robot merged 1 commit into
kubernetes-sigs:mainfrom
ShaanveerS:gc-wl
May 18, 2026
Merged

Fix finished Workload GC with finalizers#11181
k8s-ci-robot merged 1 commit into
kubernetes-sigs:mainfrom
ShaanveerS:gc-wl

Conversation

@ShaanveerS
Copy link
Copy Markdown
Member

What type of PR is this?

/kind bug

What this PR does / why we need it:

When finished Workload retention expires, if the Workload still has the resource-in-use finalizer attached, Kubernetes marks it for deletion but won't actually remove it.

This was already called out as a caveat in KEP-1618:

The deletion mechanism assumes that finished Workload objects do not have any finalizers attached to them. After an investigation, the exact place in the code was found that removes the resource-in-use finalizer when the Workload state transitions to finished. If that behavior ever changes, or if there is an external source of a finalizer being attached to the Workload, it will be marked for deletion by the Kubernetes client but will not actually be deleted until the finalizer is removed.

WorkloadSlices were added after the KEP.
Replaced WorkloadSlices are marked finished by the scheduler in replaceWorkloadSlice, and that path only calls workload.Finish, which sets the finished condition but does not remove the finalizer.

The fix is to remove it in the finished Workload GC path, so the Workload can actually be removed once retention has elapsed.

Which issue(s) this PR fixes:

Fixes #11130

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fix a bug where finished Workloads could remain stuck after object retention deletion if they still had Kueue's resource-in-use finalizer.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. labels May 14, 2026
@netlify
Copy link
Copy Markdown

netlify Bot commented May 14, 2026

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 2b33d90
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/6a0691aa4b2df50008b9ec1f

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 14, 2026
Comment thread pkg/controller/core/workload_controller.go Outdated
Comment thread pkg/controller/core/workload_controller.go Outdated
Comment thread pkg/controller/core/workload_controller.go Outdated
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 14, 2026
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 14, 2026
Comment thread pkg/workload/workload.go
Comment thread pkg/controller/core/workload_controller.go Outdated
Comment thread pkg/controller/core/workload_controller.go Outdated
@ShaanveerS
Copy link
Copy Markdown
Member Author

the failed test seems another recurrence of #10298

Comment thread pkg/controller/core/workload_controller_test.go
Comment on lines 278 to 282
if !deleteRequested {
return ctrl.Result{}, nil
}

r.recorder.Eventf(&wl, nil, corev1.EventTypeNormal, "Deleted", "Deleted", "Deleted finished workload due to elapsed retention")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, but let's move the record under if deleteRequested, becasuse the return statements don't differ

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 14, 2026
Signed-off-by: Shaanveer Singh <shaanver.singh@gmail.com>
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 15, 2026
Copy link
Copy Markdown
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve
/cherrypick release-0.17
/cherrypick release-0.16
Thank you for fixing the issue, and also commonizing the code to make the code more resilient in the future.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 18, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: ac6f38781f40487b2a44ff932637513762c01874

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimowo, ShaanveerS

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 18, 2026
@k8s-ci-robot k8s-ci-robot merged commit af2d1c7 into kubernetes-sigs:main May 18, 2026
40 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.18 milestone May 18, 2026
@mimowo
Copy link
Copy Markdown
Contributor

mimowo commented May 18, 2026

/cherrypick release-0.17
/cherrypick release-0.16

@k8s-infra-cherrypick-robot
Copy link
Copy Markdown
Contributor

@mimowo: #11181 failed to apply on top of branch "release-0.16":

Applying: Fix finished Workload GC with finalizers
Using index info to reconstruct a base tree...
M	pkg/controller/core/workload_controller.go
M	pkg/controller/core/workload_controller_test.go
M	pkg/controller/jobframework/reconciler.go
M	pkg/controller/jobs/leaderworkerset/leaderworkerset_reconciler.go
M	pkg/workload/workload.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/workload/workload.go
Auto-merging pkg/controller/jobs/leaderworkerset/leaderworkerset_reconciler.go
Auto-merging pkg/controller/jobframework/reconciler.go
CONFLICT (content): Merge conflict in pkg/controller/jobframework/reconciler.go
Auto-merging pkg/controller/core/workload_controller_test.go
Auto-merging pkg/controller/core/workload_controller.go
CONFLICT (content): Merge conflict in pkg/controller/core/workload_controller.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0001 Fix finished Workload GC with finalizers

Details

In response to this:

/cherrypick release-0.17
/cherrypick release-0.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-infra-cherrypick-robot
Copy link
Copy Markdown
Contributor

@mimowo: #11181 failed to apply on top of branch "release-0.17":

Applying: Fix finished Workload GC with finalizers
Using index info to reconstruct a base tree...
M	pkg/controller/core/workload_controller.go
M	pkg/controller/core/workload_controller_test.go
M	pkg/controller/jobframework/reconciler.go
M	pkg/controller/jobs/leaderworkerset/leaderworkerset_reconciler.go
M	pkg/workload/workload.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/workload/workload.go
Auto-merging pkg/controller/jobs/leaderworkerset/leaderworkerset_reconciler.go
Auto-merging pkg/controller/jobframework/reconciler.go
Auto-merging pkg/controller/core/workload_controller_test.go
Auto-merging pkg/controller/core/workload_controller.go
CONFLICT (content): Merge conflict in pkg/controller/core/workload_controller.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0001 Fix finished Workload GC with finalizers

Details

In response to this:

/cherrypick release-0.17
/cherrypick release-0.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mimowo
Copy link
Copy Markdown
Contributor

mimowo commented May 18, 2026

@ShaanveerS please prepare CPs manually

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Finished workload slices aren't garbage-collected

5 participants