WIP - CLB#9770
Conversation
Co-authored-by: David Kowalski <50632861+david-kow@users.noreply.github.com>
…provider-azure into clb-backendpool
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Hi @georgeedward2000. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: georgeedward2000 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
…onfig struct - Removed PRE_MERGE_CHECKLIST.md and REVIEW_SUMMARY.md as they are no longer needed. - Updated ServiceOperationState to use ServiceConfig instead of IsInbound boolean. - Refactored AddService, DeleteService, and related methods to utilize ServiceConfig for better configuration management. - Enhanced validation logic in ServiceConfig to ensure service UID is not empty. - Adjusted tests to accommodate changes in ServiceOperationState and ServiceConfig. - Improved metrics tracking to reflect changes in service configuration. - Cleaned up service updater logic to align with new configuration structure.
- Improved the logic for handling pod updates in `azure_servicegateway_pods.go` to better manage IP initialization, label changes, and pod validity states. - Added comprehensive unit tests for `podInformerAddPod`, `podInformerRemovePod`, and `podInformerUpdateFunc` to ensure correct behavior under various scenarios, including edge cases. - Introduced helper functions for building resource names in `resource_helpers.go` to streamline resource management. - Enhanced error handling and logging in `initialization.go` and `service_updater.go` to improve clarity during resource creation and deletion processes. - Updated the `DiffTracker` to ensure proper triggering of location updates and validation of network client initialization.
…ion states, and fully unregister services from ServiceGateway
…ization improvements - Added atomic counters and initialization tracking to the DiffTracker to manage in-flight updater triggers and check for initialization completion. - Updated LocationsUpdater and ServiceUpdater to decrement the in-flight trigger counter and check for initialization completion asynchronously. - Introduced filtering in service synchronization to only include services that are in the StateCreated state, preventing premature synchronization attempts. - Improved logging for service synchronization and equality checks between K8s and NRP resources for better observability. - Adjusted performance test parameters to reduce the number of services created in parallel, ensuring stability during tests.
…etion with ServiceGatewayEnabled
- Updated the CLB performance test to create and delete 200 LoadBalancer services in parallel, improving scalability and performance measurement. - Refactored service and pod creation logic for better concurrency handling. - Implemented dynamic cleanup verification for Azure resources after service deletion. - Added new utility functions for managing Cloud Controller Manager (CCM) pods, including creation, deletion, and readiness checks. - Updated resource group name in the test suite to reflect the current environment. - Introduced structured logging for better traceability during test execution.
… pod failure handling assertions - Introduced a new test file `clb_deletion_crash_test.go` to validate the recovery of Container Load Balancer (CLB) services and egress pods after a crash of the Cloud Controller Manager (CCM) during deletion operations. - Implemented three tests to cover scenarios of deleting inbound services, egress pods, and a mixed deletion of both, ensuring that resources are cleaned up correctly after a CCM crash. - Enhanced assertions in `clb_lifecycle_test.go` to allow for temporary registration of crashing pods, ensuring that at least healthy pods are registered without exceeding the total pod count.
- Updated TODO comments to clarify the removal timeline for aks-rp support. - Enhanced error handling and retry logic for adding and removing service gateway finalizers with exponential backoff. - Introduced metrics for tracking pending pod deletions. - Refactored CLB test configuration to utilize environment variables and AzureTestClient for better flexibility. - Removed hardcoded values in CLB tests, replacing them with environment variable support. - Improved test structure to ensure proper initialization of configuration before running tests.
- Implemented a suite of stress tests for the Azure Load Balancer, including: - Service churn tests to validate rapid create and delete cycles of services and pods. - Pod crash handling tests to ensure only healthy pods are registered during scaling operations. - Egress pod churn tests to verify the handling of rapid create and delete cycles for egress pods. - Added upscale and downscale tests to validate dynamic scaling of pods, ensuring proper registration and deregistration in the Service Gateway. - Created utility functions for querying Azure resources and verifying cleanup after tests. - Established a test suite structure for organizing and executing the tests effectively.
…plement SLB reachability tests
- Add `extractAzureErrorInfo` function to map Azure SDK errors to canonical error codes. - Enhance logging in `ServiceUpdater` methods to include correlation IDs and detailed error information. - Introduce `CreatedAt`, `IsOrphan`, `CorrelationID`, `TriggeringPodNamespace`, and `TriggeringPodName` fields in `ServiceOperationState` for better tracking and management of service operations. - Implement periodic updates for the oldest-age metric in `ServiceUpdater`. - Update unit tests to cover various Azure error scenarios and ensure correct error handling.
…name; update related configurations and tests
What type of PR is this?
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: