Skip to content

WIP - CLB#9770

Draft
georgeedward2000 wants to merge 106 commits into
kubernetes-sigs:masterfrom
georgeedward2000:eddie/dev/clb-difftracker-engine
Draft

WIP - CLB#9770
georgeedward2000 wants to merge 106 commits into
kubernetes-sigs:masterfrom
georgeedward2000:eddie/dev/clb-difftracker-engine

Conversation

@georgeedward2000
Copy link
Copy Markdown
Contributor

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?


Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


georgeedward2000 and others added 30 commits January 9, 2025 14:43
Co-authored-by: David Kowalski <50632861+david-kow@users.noreply.github.com>
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @georgeedward2000. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 15, 2025
@github-actions github-actions Bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Dec 15, 2025
@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Dec 15, 2025
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: georgeedward2000
Once this PR has been reviewed and has the lgtm label, please assign bridgetkromhout for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Dec 15, 2025
…onfig struct

- Removed PRE_MERGE_CHECKLIST.md and REVIEW_SUMMARY.md as they are no longer needed.
- Updated ServiceOperationState to use ServiceConfig instead of IsInbound boolean.
- Refactored AddService, DeleteService, and related methods to utilize ServiceConfig for better configuration management.
- Enhanced validation logic in ServiceConfig to ensure service UID is not empty.
- Adjusted tests to accommodate changes in ServiceOperationState and ServiceConfig.
- Improved metrics tracking to reflect changes in service configuration.
- Cleaned up service updater logic to align with new configuration structure.
- Improved the logic for handling pod updates in `azure_servicegateway_pods.go` to better manage IP initialization, label changes, and pod validity states.
- Added comprehensive unit tests for `podInformerAddPod`, `podInformerRemovePod`, and `podInformerUpdateFunc` to ensure correct behavior under various scenarios, including edge cases.
- Introduced helper functions for building resource names in `resource_helpers.go` to streamline resource management.
- Enhanced error handling and logging in `initialization.go` and `service_updater.go` to improve clarity during resource creation and deletion processes.
- Updated the `DiffTracker` to ensure proper triggering of location updates and validation of network client initialization.
…ion states, and fully unregister services from ServiceGateway
…ization improvements

- Added atomic counters and initialization tracking to the DiffTracker to manage in-flight updater triggers and check for initialization completion.
- Updated LocationsUpdater and ServiceUpdater to decrement the in-flight trigger counter and check for initialization completion asynchronously.
- Introduced filtering in service synchronization to only include services that are in the StateCreated state, preventing premature synchronization attempts.
- Improved logging for service synchronization and equality checks between K8s and NRP resources for better observability.
- Adjusted performance test parameters to reduce the number of services created in parallel, ensuring stability during tests.
- Updated the CLB performance test to create and delete 200 LoadBalancer services in parallel, improving scalability and performance measurement.
- Refactored service and pod creation logic for better concurrency handling.
- Implemented dynamic cleanup verification for Azure resources after service deletion.
- Added new utility functions for managing Cloud Controller Manager (CCM) pods, including creation, deletion, and readiness checks.
- Updated resource group name in the test suite to reflect the current environment.
- Introduced structured logging for better traceability during test execution.
… pod failure handling assertions

- Introduced a new test file `clb_deletion_crash_test.go` to validate the recovery of Container Load Balancer (CLB) services and egress pods after a crash of the Cloud Controller Manager (CCM) during deletion operations.
- Implemented three tests to cover scenarios of deleting inbound services, egress pods, and a mixed deletion of both, ensuring that resources are cleaned up correctly after a CCM crash.
- Enhanced assertions in `clb_lifecycle_test.go` to allow for temporary registration of crashing pods, ensuring that at least healthy pods are registered without exceeding the total pod count.
- Updated TODO comments to clarify the removal timeline for aks-rp support.
- Enhanced error handling and retry logic for adding and removing service gateway finalizers with exponential backoff.
- Introduced metrics for tracking pending pod deletions.
- Refactored CLB test configuration to utilize environment variables and AzureTestClient for better flexibility.
- Removed hardcoded values in CLB tests, replacing them with environment variable support.
- Improved test structure to ensure proper initialization of configuration before running tests.
- Implemented a suite of stress tests for the Azure Load Balancer, including:
  - Service churn tests to validate rapid create and delete cycles of services and pods.
  - Pod crash handling tests to ensure only healthy pods are registered during scaling operations.
  - Egress pod churn tests to verify the handling of rapid create and delete cycles for egress pods.

- Added upscale and downscale tests to validate dynamic scaling of pods, ensuring proper registration and deregistration in the Service Gateway.
- Created utility functions for querying Azure resources and verifying cleanup after tests.
- Established a test suite structure for organizing and executing the tests effectively.
- Add `extractAzureErrorInfo` function to map Azure SDK errors to canonical error codes.
- Enhance logging in `ServiceUpdater` methods to include correlation IDs and detailed error information.
- Introduce `CreatedAt`, `IsOrphan`, `CorrelationID`, `TriggeringPodNamespace`, and `TriggeringPodName` fields in `ServiceOperationState` for better tracking and management of service operations.
- Implement periodic updates for the oldest-age metric in `ServiceUpdater`.
- Update unit tests to cover various Azure error scenarios and ensure correct error handling.
…name; update related configurations and tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. do-not-merge/contains-merge-commits do-not-merge/needs-kind do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants