TAS: Stop implicitly assuming ResrouceFlavor:Node = 1:N#11210
Conversation
|
Skipping CI for Draft Pull Request. |
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test all |
bcba589 to
f5595ef
Compare
|
/test all |
f5595ef to
f1c6d1d
Compare
|
/hold |
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
f1c6d1d to
4ab4a93
Compare
| if features.Enabled(features.TopologyAwareScheduling) { | ||
| for flavor, cache := range c.tasCache.Clone() { | ||
| tasSnapshots[flavor] = cache.snapshot(log, c.tasCache.nodesCache.find(cache.flavor.NodeLabels, cache.topology.Levels)) | ||
| flavorClone := c.tasCache.Clone() |
There was a problem hiding this comment.
Can we move it closer to its first usage?
| // One shared assumedUsage map per TopologyName for this workload. PodSets | ||
| // landing on different sibling flavors (same Topology, hostname leaf) | ||
| // reserve against the same map, preventing intra-workload self-overlap on | ||
| // a shared physical node. Cache-write-time aggregation does not cover | ||
| // in-flight reservations because pending workloads have not hit addUsage. |
There was a problem hiding this comment.
This is tested in the test "multi-PodSet workload across sibling flavors must not self-overlap on shared node", right?
| } | ||
| }) | ||
|
|
||
| ginkgo.It("should admit the pending workload to tas-flavor-b on the free node when tas-flavor-a is full", func() { |
There was a problem hiding this comment.
How would this test fail without the fix?
There was a problem hiding this comment.
This is repro PR: https://github.com/kubernetes-sigs/kueue/pull/10657/changes#r3118973170
| wantCalls: 3, | ||
| wantCollected: map[string]int{"a": 1, "b": 2, "c": 3}, | ||
| }, | ||
| "early termination: stops on first false": { |
There was a problem hiding this comment.
shouldn't we check the value of wantCollected in this test? This is the only test where it should differ from seed, no?
There was a problem hiding this comment.
Oh, good point. You're right.
There was a problem hiding this comment.
I fixed this in 70b8434
The seed is map. So, the expectation is not stable. Hence, I added a verification if the collected map key-value pairs match with seed's one.
| return old, existed | ||
| } | ||
|
|
||
| func (dwc *SyncMap[K, V]) Range(f func(key K, value V) bool) { |
There was a problem hiding this comment.
Since this function holds only rlock, if someone would try to call it with f(k,v) that modifies the underlying map, it could be dangerous (deadlock?). Shouldn't we add a comment warning about this?
There was a problem hiding this comment.
Yes, good point.
I'm wondering if we should take RW lock instead of RLock here.
Any preference adding comments vs taking RW lock?
There was a problem hiding this comment.
I just added a comment in 8bfa984
Because Writable Range will cause performance regressions for syncMap.
| for flavor, cache := range flavorClone { | ||
| nodes := c.tasCache.nodesCache.find(cache.flavor.NodeLabels, cache.topology.Levels) | ||
| var sharedUsage map[utiltas.TopologyDomainID]resources.Requests | ||
| if cache.topology.Usage != nil { |
There was a problem hiding this comment.
In the loop above we don't initialize the sharedSnapshotUsage[topologyName] if usageLength == 0. But here we don't check for that. But if a topology has 0 usage, it might still need the shared map (for example if the first workload has multiple podsets)?
There was a problem hiding this comment.
Good catch, we should initialize that when sharedSnapshotUsage[topologyName] was not initialized in the previous loop block due to the usageLength.
There was a problem hiding this comment.
I addressed this in 1be5bd2
The sharedSnapshotUsage[topologyName] is always initialized even when the length of usageLength is zero.
| tasFlavorCache := c.TASFlavors[tasFlavor] | ||
| flvResult := tasFlavorCache.FindTopologyAssignmentsForFlavor(flavorTASRequests, options...) | ||
| flvOpts := options | ||
| if tasFlavorCache != nil && tasFlavorCache.isLowestLevelNode { |
There was a problem hiding this comment.
if tasFlavorCache would be nil it would panic in the line below this if. The comment above says that "tasFlavor is already in the snapshot" so I think tasFlavorCache != nil should either be removed entirely or changed to a guard clause.
There was a problem hiding this comment.
That totally makes sense. Let's remove tasFlavorCache != nil because we can expect that it has already been checked as described in the above code comment.
…cMapRange UT Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
bb9dd0a to
3afdebb
Compare
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
|
@pajakd Thank you for your review, I addressed all your comments, PTAL, thank you 🙏 |
|
@tenzen-y please rebase and I think it would be good to squeeze the commits in case we cherrypick |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| c.tasCache.RLock() | ||
| for topologyName, info := range c.tasCache.topologies { | ||
| if info.Usage == nil { | ||
| continue | ||
| } | ||
| clonedUsage := make(map[utiltas.TopologyDomainID]resources.Requests, info.Usage.Len()) | ||
|
|
||
| info.Usage.Range(func(domainID utiltas.TopologyDomainID, req resources.Requests) bool { | ||
| clonedUsage[domainID] = req.Clone() | ||
| return true | ||
| }) | ||
| sharedSnapshotUsage[topologyName] = clonedUsage | ||
| } | ||
| c.tasCache.RUnlock() |
There was a problem hiding this comment.
Exract this code to a helper function so that we can use the Lock, defer UnLock pattern
mimowo
left a comment
There was a problem hiding this comment.
Given the amount of non-trivial changes I would consider a feature gate for the fix, like TASHandleOverlappingFlavors
|
cc @Ladicle maybe could also give it a pass who already have pretty solid understanding of TAS |
What type of PR is this?
/kind bug
/area tas
What this PR does / why we need it:
I fixed a TAS over-subscription bug where multiple
ResourceFlavorreferencing the same Topology and overlapping on physical nodes each independently trackusage, so sibling flavors believe a shared node is only partially used.This PR refined the TAS cache so that sibling flavors aggregate usage.
Which issue(s) this PR fixes:
Fixes #10659
Special notes for your reviewer:
#10657 is problem reproduced PR.
Does this PR introduce a user-facing change?