Add RegisterTargetsInterleaved for fair target allocation across target groups#4604
Add RegisterTargetsInterleaved for fair target allocation across target groups#4604yykkibbb wants to merge 1 commit into
Conversation
|
Welcome @yykkibbb! |
|
Hi @yykkibbb. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: yykkibbb The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
…et groups This commit adds a new RegisterTargetsInterleaved method to TargetsManager that registers targets to multiple target groups in an interleaved manner. When AWS quota limits are reached, the current sequential registration causes some target groups to be starved of targets while others are full. For example, with 250 nodes, 4 ports, and a quota of 500: - Current: TG1=250, TG2=250, TG3=0, TG4=0 (ports 3,4 have no traffic) - With interleaved: TG1=125, TG2=125, TG3=125, TG4=125 (all ports work) The new method registers targets in chunks across all target groups before moving to the next chunk, ensuring fair distribution even when quotas are exceeded. Related to: kubernetes-sigs#4025
0c8d9ca to
6f4eaf2
Compare
|
Thanks for opening this. The algorithm makes sense to me. I think there is some more discussion needed here, if we want to accept this. 1/ TargetGroupBindings only know of their TargetGroup. |
|
Hi @zac-nixon, thank you for taking the time to review and for the insightful questions. They helped me reconsider some aspects of the design that I hadn't fully thought through. After going through your feedback and revisiting the codebase (especially the existing maxTargetsPerTargetGroup mechanism in resource_manager.go and the IndexKeyServiceRefName index), I was thinking about a different approach that might fit the architecture better: sibling-aware self-limiting within the existing single-TGB reconciler. The rough idea would be:
This would build on the existing maxTargetsPerTargetGroup mechanism rather than introducing a new registration path, and it stays within the current architecture where each TGB reconciles independently. For the "new TG added" scenario: when numSiblings increases, each TGB's fair share would automatically decrease on the next reconcile cycle. Existing targets wouldn't be un-registered, but new registrations would respect the reduced limit, and natural node turnover (e.g., ASG replacements) would gradually ebalance the distribution over time. Of course, this is just an initial thought — I'd love to hear if this direction makes sense or if there are concerns I'm still missing. Happy to revise the PR or take a completely different approach if needed. Thanks again for the guidance! |
Summary
This PR adds a new
RegisterTargetsInterleavedmethod toTargetsManagerthat registers targets to multiple target groups in an interleaved manner, ensuring fair distribution when AWS quota limits are reached.Problem (Issue #4025):
When replacing nodes in a cluster with an NLB serving multiple ports, the controller registers all targets to TG1 first, then TG2, etc. If the quota is exceeded midway, later target groups receive zero targets, causing service outages for those ports.
Example with 250 nodes, 4 ports, quota=500:
Changes
TargetGroupTargetsstruct to represent a TGB and its targetsRegisterTargetsInterleavedmethod that:Note
This PR adds the foundational API. The method is not yet called from the reconciler. Integration with
resource_manager.goto actually use this method for multi-TGB scenarios would be the next step. I'd appreciate feedback on the preferred approach for that integration.Test plan
RegisterTargetsInterleavedpassRelated to: #4025