[release-0.17] Finish old workload slice only after new slice is admitted#11557
[release-0.17] Finish old workload slice only after new slice is admitted#11557sohankunkerkar wants to merge 2 commits into
Conversation
…tes-sigs#11292) Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>
✅ Deploy Preview for kubernetes-sigs-kueue ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sohankunkerkar The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR ensures correct ordering during elastic Job workload-slice replacement: the old workload-slice should not be marked Finished before the new slice becomes Admitted.
Changes:
- Move workload-slice replacement/finishing to occur after the replacement slice is admitted (scheduler).
- Add a watch-based test helper to assert ordering of “new admitted” vs “old finished”.
- Add an integration test covering the ordering and adjust scheduler test event order.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| test/util/util.go | Adds a watch-based assertion helper for workload-slice ordering. |
| test/integration/singlecluster/controller/jobs/job/job_controller_test.go | Adds an integration test that sets up a Workload watch and verifies ordering on scale-up. |
| pkg/scheduler/scheduler.go | Changes scheduler flow to admit the new slice first, then finish the old slice. |
| pkg/scheduler/scheduler_test.go | Updates expected event ordering to match new scheduler behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| status, _ := evt.Object.(*metav1.Status) | ||
| gomega.Expect(evt.Type).ShouldNot(gomega.Equal(watch.Error), fmt.Sprintf("watch error: %v", status)) | ||
| } | ||
| if evt.Type != watch.Modified { |
| if wl.Name == oldWorkloadName && workload.IsFinished(wl) { | ||
| oldSliceFinished = true | ||
| } | ||
| if wl.Name != oldWorkloadName && workload.IsAdmitted(wl) { | ||
| gomega.Expect(oldSliceFinished).Should(gomega.BeFalse(), | ||
| "old workload slice was finished before new slice was admitted") | ||
| newSliceAdmitted = true | ||
| } |
| status, _ := evt.Object.(*metav1.Status) | ||
| gomega.Expect(evt.Type).ShouldNot(gomega.Equal(watch.Error), fmt.Sprintf("watch error: %v", status)) |
| if features.Enabled(features.ElasticJobsViaWorkloadSlices) && oldWorkloadSlice != nil { | ||
| if err := s.replaceWorkloadSlice(ctx, oldWorkloadSlice.WorkloadInfo.ClusterQueue, e.Obj, oldWorkloadSlice.WorkloadInfo.Obj.DeepCopy()); err != nil { | ||
| log.Error(err, "Failed to finish old workload slice after admitting replacement; job reconciler will handle recovery") | ||
| } | ||
| } |
This is a manual cherry-pick of #11195 and #11292
/assign sohankunkerkar