Skip to content

Workload Aware Scheduling proof of concept#4723

Open
marosset wants to merge 11 commits into
ray-project:masterfrom
marosset:workload-poc
Open

Workload Aware Scheduling proof of concept#4723
marosset wants to merge 11 commits into
ray-project:masterfrom
marosset:workload-poc

Conversation

@marosset
Copy link
Copy Markdown
Contributor

@marosset marosset commented Apr 16, 2026

Why are these changes needed?

This PR adds support for the Kubernetes native workload aware scheduling.

This is part of
#4344

And a design proposal is also available at

https://docs.google.com/document/d/1I9MtPkBMIj-67ee8abFK3_jS-JcosW1vnbzeDQ6MaKg

important note

Many of the changes in here are related to the golang / k8s client-go version bumps needed to use the new scheduling APIs introduced in K8s v1.36.

I have a seperate set of changes for this at #4703 and will drop the first several commits from this PR and rebase once that change merges.

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@marosset
Copy link
Copy Markdown
Contributor Author

@marosset
Copy link
Copy Markdown
Contributor Author

@mm4tt as well

@jackfrancis
Copy link
Copy Markdown

FYI we have some E2E tests scaffolded out here in the CAPZ project: kubernetes-sigs/cluster-api-provider-azure#6227

@andrewsykim
Copy link
Copy Markdown
Member

Thanks for the PR Mark, will find some time to review it soon.

Were there any notable friction points or instances where Workload API integration was incompatible or awkward to use in KubeRay?

Copy link
Copy Markdown
Member

@Future-Outlier Future-Outlier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @marosset,

Would you mind taking a look at @troychiu’s demo POC for the workload API here?
master...troychiu:kuberay:kubecon-demo-gang-scheduling-dra

I’m wondering why Troy only wrote around 500 lines of changes, while yours has more than 5,000 lines.

@marosset
Copy link
Copy Markdown
Contributor Author

marosset commented Apr 16, 2026

Hi, @marosset,

Would you mind taking a look at @troychiu’s demo POC for the workload API here? master...troychiu:kuberay:kubecon-demo-gang-scheduling-dra

I’m wondering why Troy only wrote around 500 lines of changes, while yours has more than 5,000 lines.

~3000 lines are just new tests (unit and e2e)
2000+ in ray-operator/controllers/ray/native_workload_scheduling_test.go
another 900 in ray-operator/test/e2enativescheduling/raycluster_nativescheduling_test.go
And the PR to do the go updates is also ~1200 lines (due to all of the ptr.to refs ->new as part of the golang v1.25 -> golang v1.26 golangci-lint --fix fixes)

Most of the actual logic is in ray-operator/controllers/ray/native_workload_scheduling.go which is ~500 lines.

@Future-Outlier do you have any concerns with specifics of the changes or shock from the PR size?

@Future-Outlier
Copy link
Copy Markdown
Member

Future-Outlier commented Apr 16, 2026

Hi, @marosset,
Would you mind taking a look at @troychiu’s demo POC for the workload API here? master...troychiu:kuberay:kubecon-demo-gang-scheduling-dra
I’m wondering why Troy only wrote around 500 lines of changes, while yours has more than 5,000 lines.

~3000 lines are just new tests (unit and e2e) 2000+ in ray-operator/controllers/ray/native_workload_scheduling_test.go another 900 in ray-operator/test/e2enativescheduling/raycluster_nativescheduling_test.go And the PR to do the go updates is also ~1200 lines (due to all of the ptr.to refs ->new as part of the golang v1.25 -> golang v1.26 golangci-lint --fix fixes)

Most of the actual logic is in ray-operator/controllers/ray/native_workload_scheduling.go which is ~500 lines.

@Future-Outlier do you have any concerns with specifics of the changes or shock from the PR size?

I think for POC we just need

  1. ray-operator/controllers/ray/native_workload_scheduling.go
  2. provide an example for use to reviewers on kind (k8s cluster) will be good enough
  3. since this PR is not going to be merge (just a POC) other files seem necessary

@andrewsykim
Copy link
Copy Markdown
Member

andrewsykim commented Apr 17, 2026

since this PR is not going to be merge (just a POC) other files seem necessary

The immediate high-priority goal of this PR is to prototype and prove that the new Workload API is compatible with KubeRay and identify any changes requried in Workload API. However, I do think we have the intention of merging this PR eventually. I think having the other files is ok with that in mind, I will probably not review all the tests in the near term though if I'm being honest :)

@andrewsykim
Copy link
Copy Markdown
Member

With that said, I'd like to really understand what (if any) feedback we have for upstream Kubernetes for Workload API after implementing the prototype

@marosset
Copy link
Copy Markdown
Contributor Author

I can move just the changes needed to get the POC working (plus instructions, kind config, etc) into a new branch and open a new PR for folks just wanting to test it out in a branch.

I'm putting together a list of feedback I have for the v1alpha2 api that I'll share here and with the K8s workload-aware-scheduling group.
I do think that introducing mutability (as planned for K8s v.137) will help immensely.
I did run into some issues where some of the resources weren't being cleaned up as expected that I was able to work around but i'll provide more details later (probably early next week)

@Future-Outlier
Copy link
Copy Markdown
Member

will put this in my todo list next week

@seanlaii
Copy link
Copy Markdown
Contributor

seanlaii commented Apr 18, 2026

Thanks for putting this together! I want to make sure I’m understanding the design and Troy’s PoC correctly, and also share a few thoughts.

My current understanding is that, aside from whether this should go through the batch scheduler interface for lifecycle management, the main difference between this PR and Troy’s PoC is the PodGroup granularity.

In this PR, it looks like each worker group gets its own PodGroup (plus one for the head), while in Troy’s implementation the whole RayCluster is treated as a single PodGroup. That feels closer to the existing co-scheduling / scheduler-plugins model, where the cluster is effectively scheduled as a single unit.

One thing I’m wondering about is whether the current Workload API semantics make one approach a better fit than the other. Based on my reading, gang scheduling seems to be applied per PodGroup, rather than across multiple PodGroups as a single all-or-nothing unit. If that understanding is correct, then if the goal is to schedule the whole RayCluster together and make this behavior available to users today, the single-PodGroup approach seems to map more directly to the current API behavior.

On the other hand, if the Workload API eventually supports gang scheduling across multiple PodGroups, then having one PodGroup per worker group would also make sense, especially if there are use cases where users may want gang scheduling semantics at the worker-group level rather than the whole cluster level or supporting Topology-Aware scheduling for each worker group.

Does that sound right? Please feel free to correct me if I’m missing anything. And if this tradeoff has already been discussed somewhere, I’d really appreciate a pointer. Thanks!

@marosset
Copy link
Copy Markdown
Contributor Author

Thanks for putting this together! I want to make sure I’m understanding the design and Troy’s PoC correctly, and also share a few thoughts.

My current understanding is that, aside from whether this should go through the batch scheduler interface for lifecycle management, the main difference between this PR and Troy’s PoC is the PodGroup granularity.

In this PR, it looks like each worker group gets its own PodGroup (plus one for the head), while in Troy’s implementation the whole RayCluster is treated as a single PodGroup. That feels closer to the existing co-scheduling / scheduler-plugins model, where the cluster is effectively scheduled as a single unit.

One thing I’m wondering about is whether the current Workload API semantics make one approach a better fit than the other. Based on my reading, gang scheduling seems to be applied per PodGroup, rather than across multiple PodGroups as a single all-or-nothing unit. If that understanding is correct, then if the goal is to schedule the whole RayCluster together and make this behavior available to users today, the single-PodGroup approach seems to map more directly to the current API behavior.

On the other hand, if the Workload API eventually supports gang scheduling across multiple PodGroups, then having one PodGroup per worker group would also make sense, especially if there are use cases where users may want gang scheduling semantics at the worker-group level rather than the whole cluster level or supporting Topology-Aware scheduling for each worker group.

Does that sound right? Please feel free to correct me if I’m missing anything. And if this tradeoff has already been discussed somewhere, I’d really appreciate a pointer. Thanks!

Right now the current workload API does not support gang scheduling across multiple PodGroups but that is planned for the very near future (I believe the next K8s release). A big motivation for this work (vs Troy's) was to help validate/shape the workload APIs and after some discussion (in a doc linked to in the issue) the Workload API authors and a few others decided to assign each workgroup to it's own PodGroup and then evolve the Workload APIs. There was also a fair bit of discussion on how autoscaling of work groups would be handled and improvements needed for this are planned for the next K8s release.

@marosset
Copy link
Copy Markdown
Contributor Author

@Future-Outlier I'm going to be out of office for a few days. I'll clean up the PR when I get back early next week. thanks

@Future-Outlier
Copy link
Copy Markdown
Member

Future-Outlier commented Apr 22, 2026

@Future-Outlier I'm going to be out of office for a few days. I'll clean up the PR when I get back early next week. thanks

sure no problem, thank you!
we can schedule a meeting to talk about this with kubernetes maintainers when you are back, tks!

@mm4tt
Copy link
Copy Markdown

mm4tt commented Apr 23, 2026

Hey, thanks for putting this together!

I'm putting together a list of feedback I have for the v1alpha2 api that I'll share here and with the K8s workload-aware-scheduling group.

Really looking forward to this!

Regarding PodGroup per RayCluster vs PodGroup per worker group, we definitely recommend the PodGroup per worker group approach. While it currently doesn't have a way to express "gang-of gangs", it's only a temporary limitation and this approach better aligns with future evolution of Workload/PodGroup APIs. Gang scheduling across multiple PodGroups will be available in 1.37, together with multi-level TAS and other features. We started working on that in kubernetes/enhancements#6017. Having PodGroup per worker group that are (in 1.37) grouped together by CompositePodGroup will allow you to fully utilize WAS capabilities. One example that may be interesting to you is TAS and ability to place different worker groups in different topology domains (e.g. different racks).

Regarding PodGroup per RayCluster vs. PodGroup per worker group, we recommend the PodGroup per worker group approach. While it currently lacks a way to express "gang-of-gangs", it's only a temporary limitation, and this approach better aligns with the future evolution of the Workload and PodGroup APIs. Gang scheduling across multiple PodGroups will be available in 1.37, together with multi-level TAS and other features. We started working on that in kubernetes/enhancements#6017. Having a PodGroup per worker group, which will be grouped together by a CompositePodGroup in 1.37, will allow you to fully utilize WAS capabilities. One example that may be interesting to you is TAS and the ability to place different worker groups in different topology domains (e.g., different racks).

@seanlaii
Copy link
Copy Markdown
Contributor

@marosset @mm4tt Thank you both for the detailed explanation and the context on the roadmap!
Given the upcoming support for CompositePodGroup and multi-level TAS in the new future, adopting the "PodGroup per worker group" approach definitely makes the most sense.


### Spec drift detection

If you change the RayCluster spec (add/remove worker groups, change replica counts), the operator detects the mismatch, deletes the stale Workload and PodGroups, and recreates them from the updated spec.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That also means the pods are recreated, right? This is likely fine for PoC/alpha, but it's not something we want to end up with.

We'll definitely support MinCount mutability in 1.37. The support for adding/removing PodGroupTemplates will likely come later but this shows it's an important one. @macsko who looks into that


### Suspend and resume

When a RayCluster is suspended, the operator deletes the Workload and PodGroups alongside the pods. On resume, fresh scheduling resources are created with the current spec.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think deleting PodGroups is the right thing to do. However, Workload is intended to represent a long living user intent (scheduling configuration) and it definitely doesn't have to be deleted on suspension. Right now this might be a workaround for lack of mutability, but I think once we have that solves it would be completely OK to couple Workload object lifecycle with RayCluster object

> **Note**: This feature is in early alpha. Both the Kubernetes `scheduling.k8s.io/v1alpha2` API and the KubeRay integration are under active development. Notably, autoscaling is not supported — only fixed-size worker groups are compatible.

- **No autoscaling support**: RayClusters with autoscaling enabled (`enableInTreeAutoscaling: true`) will skip native scheduling with a warning event. Fixed-size worker groups only.
- **Max 7 worker groups**: The `scheduling.k8s.io/v1alpha2` API allows at most 8 PodGroupTemplates per Workload (1 reserved for the head group).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This limit is currently arbitrary and overall we're ok with increasing it.

What would be a reasonable limit for the RayCluster?


- **No autoscaling support**: RayClusters with autoscaling enabled (`enableInTreeAutoscaling: true`) will skip native scheduling with a warning event. Fixed-size worker groups only.
- **Max 7 worker groups**: The `scheduling.k8s.io/v1alpha2` API allows at most 8 PodGroupTemplates per Workload (1 reserved for the head group).
- **Per-worker-group atomicity only**: Each worker group is scheduled as an independent gang. There is no cross-worker-group atomicity (e.g., "schedule all GPU workers AND all CPU workers or none").
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're working on adding multi-level PodGroup hierarchy in 1.37 that solves exactly this problem in kubernetes/enhancements#6017

- **Max 7 worker groups**: The `scheduling.k8s.io/v1alpha2` API allows at most 8 PodGroupTemplates per Workload (1 reserved for the head group).
- **Per-worker-group atomicity only**: Each worker group is scheduled as an independent gang. There is no cross-worker-group atomicity (e.g., "schedule all GPU workers AND all CPU workers or none").
- **Mutually exclusive with batch schedulers**: Cannot be used together with `batchScheduler` configuration (Volcano, YuniKorn, etc.). The operator will refuse to start if both are enabled.
- **Immutable `schedulingGroup` on pods**: The `spec.schedulingGroup` field on pods is immutable. If you enable native scheduling on an already-running cluster, existing pods will not get `schedulingGroup` set. New pods (from scale-up, recreation, or suspend/resume) will be correctly configured.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, and this is not something we plan to change. If we want to support enabling "workload scheduling" on already-running cluster I think the only option we have is to recreate all the pods. Alternatively, we can disallow that and only allow setting this for new clusters.


### Pods stuck in PreEnqueue

If the kube-apiserver's `GenericWorkload` feature gate is enabled but the kube-scheduler's `GangScheduling` feature gate is **not**, the operator will successfully create Workload and PodGroup resources, but pods will remain stuck in the `PreEnqueue` scheduling gate.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This problem will eventually disappear once we promote the GenericWorkload feature gate to be enabled by default.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, i just wanted to call that out because I got stuck :P


const (
// Annotation used to opt-in a RayCluster to native workload scheduling.
NativeWorkloadSchedulingAnnotation = "ray.io/native-workload-scheduling"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annotation for opt-in is fine in PoC/alpha but eventually we should converge on RayCluster side API for enabling/disabling and consuming more advanced WAS features (like topology, preemption, etc.). I started brainstorming that in https://docs.google.com/document/d/1VG7Zto9JYuPG4Anb01WMRryJlfV6met0jgob3T2NjZ4/edit?tab=t.0#heading=h.3s8c0yl3azl9. The doc is not really up2date with where we are, but I plan to open a KEP for that in the coming weeks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, the plan is to have an API field on RayCluster to enable/disable this advanced scheduling

// podGroupProtectionFinalizer is the finalizer added by the Kubernetes scheduler to PodGroups
// when the GangScheduling feature gate is enabled on the scheduler (alpha in K8s 1.35).
// We remove it explicitly before deleting PodGroups so that deletion is immediate
// rather than waiting for the scheduler to process the finalizer removal.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this finalizer so PodGroup cannot be removed until all pods pointing to it are removed. Why cannot you just first delete pods and then PodGroup?

@troychiu
Copy link
Copy Markdown
Collaborator

Given the current lack of gang scheduling across different PodGroups, I question the utility of pursuing a "PodGroup per worker group" approach, as the primary gang scheduling functionality still wouldn't be operational. Alternatively, we could adopt a "PodGroup per RayCluster" strategy and make necessary updates after the 1.37 enhancement is complete. What are your thoughts on this?

@mm4tt
Copy link
Copy Markdown

mm4tt commented Apr 28, 2026

/cc @tosi3k

@marosset
Copy link
Copy Markdown
Contributor Author

Given the current lack of gang scheduling across different PodGroups, I question the utility of pursuing a "PodGroup per worker group" approach, as the primary gang scheduling functionality still wouldn't be operational. Alternatively, we could adopt a "PodGroup per RayCluster" strategy and make necessary updates after the 1.37 enhancement is complete. What are your thoughts on this?

@seanlaii @troychiu Here is the KEP for gang of gang scheduling
kubernetes/enhancements#6017

PodGroup per RayCluster would probably make more sense for ray usage but might not be as useful in validating the workload APIs in Kubernetes.
I'm OK either way and it really depends which we want to prioritize (i was leaning more towards K8s API validation but can switch).
cc @mm4tt @andrewsykim

@marosset
Copy link
Copy Markdown
Contributor Author

Given the current lack of gang scheduling across different PodGroups, I question the utility of pursuing a "PodGroup per worker group" approach, as the primary gang scheduling functionality still wouldn't be operational. Alternatively, we could adopt a "PodGroup per RayCluster" strategy and make necessary updates after the 1.37 enhancement is complete. What are your thoughts on this?

@seanlaii @troychiu Here is the KEP for gang of gang scheduling kubernetes/enhancements#6017

PodGroup per RayCluster would probably make more sense for ray usage but might not be as useful in validating the workload APIs in Kubernetes. I'm OK either way and it really depends which we want to prioritize (i was leaning more towards K8s API validation but can switch). cc @mm4tt @andrewsykim

I just saw above that @mm4tt is also hoping to keep PodGroup per WorkerGroup

Regarding PodGroup per RayCluster vs. PodGroup per worker group, we recommend the PodGroup per worker group approach. While it currently lacks a way to express "gang-of-gangs", it's only a temporary limitation, and this approach better aligns with the future evolution of the Workload and PodGroup APIs. Gang scheduling across multiple PodGroups will be available in 1.37, together with multi-level TAS and other features. We started working on that in kubernetes/enhancements#6017. Having a PodGroup per worker group, which will be grouped together by a CompositePodGroup in 1.37, will allow you to fully utilize WAS capabilities. One example that may be interesting to you is TAS and the ability to place different worker groups in different topology domains (e.g., different racks).

@rueian
Copy link
Copy Markdown
Collaborator

rueian commented Apr 29, 2026

Given the current lack of gang scheduling across different PodGroups, I question the utility of pursuing a "PodGroup per worker group" approach, as the primary gang scheduling functionality still wouldn't be operational. Alternatively, we could adopt a "PodGroup per RayCluster" strategy and make necessary updates after the 1.37 enhancement is complete. What are your thoughts on this?

I feel like both approaches can be done concurrently, since the "PodGroup per RayCluster" approach can be implemented behind the batch scheduler interface and won't conflict with the work here.

@Future-Outlier
Copy link
Copy Markdown
Member

I'll try to find time take a look before the meeting in 12 hours.

@Future-Outlier
Copy link
Copy Markdown
Member

Future-Outlier commented May 7, 2026

  1. Quick question: will the Workload API be enabled by default in Kubernetes in the future? If so, this would break the backward compatibility of the current kuberay scheduler behavior.

UPDATE: we can let user configure the scheduling behavior
2. If we go with the "one PodGroup per worker group" integration, is this considered a best practice? Is any open source project actually using it this way?

UPDATE: this can cover most usecases, since the most common usecase is 1 raycluster = 1 head pod + 1 worker group

})

for _, wg := range instance.Spec.WorkerGroupSpecs {
minCount := utils.GetWorkerGroupDesiredReplicas(wg)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewsykim - the current PoC uses GetWorkerGroupDesiredReplias() which should account for numHosts

marosset added 7 commits May 11, 2026 21:24
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
…eduling

Signed-off-by: Mark Rossetti <marosset@microsoft.com>
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
marosset added 4 commits May 11, 2026 21:59
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
@marosset marosset marked this pull request as ready for review May 11, 2026 22:06
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 7cc892c. Configure here.

"RayCluster has %d worker groups, but native workload scheduling supports at most %d (%d PodGroupTemplates total, 1 reserved for head)",
len(instance.Spec.WorkerGroupSpecs), maxWorkerGroups, schedulingv1alpha2.WorkloadMaxPodGroupTemplates)
return fmt.Errorf("RayCluster %s/%s has %d worker groups, exceeding the maximum of %d for native workload scheduling",
instance.Namespace, instance.Name, len(instance.Spec.WorkerGroupSpecs), maxWorkerGroups)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many worker groups error blocks all pod reconciliation

Medium Severity

When skipReasonTooManyWorkerGroups is triggered, reconcileNativeWorkloadScheduling returns an error rather than returning nil with a warning event like the other skip reasons (autoscaling, batch scheduler). Since this error propagates from reconcilePods, it blocks ALL pod reconciliation — including scaling, deletion, and health management — creating an infinite error-requeue loop that makes the cluster unmanageable. The cluster becomes stuck if a user enables the annotation but has more than 7 worker groups.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7cc892c. Configure here.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7cc892c5b8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +65 to +67
if skipReason == skipReasonDisabled {
return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove scheduling artifacts when native scheduling is disabled

When nativeSchedulingSkipReason becomes skipReasonDisabled (for example, the user removes ray.io/native-workload-scheduling: "true" after previously enabling it), this path returns immediately and leaves existing Workload/PodGroup objects behind. Those stale PodGroups can keep the scheduler’s podgroup-protection finalizer and are no longer explicitly cleaned up, because other cleanup paths are gated on native scheduling being currently enabled. This creates orphaned scheduling resources and can leave deletion flows stuck in terminating states instead of fully converging.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants