What happened
cloud-controller-manager repeatedly logs serviceOwnsFrontendIP warnings while reconciling an internal LoadBalancer Service with a private pinned IP:
serviceOwnsFrontendIP: unexpected error when finding match public IP of the service ingress-nginx-controller with loadBalancerIP 10.104.176.35:
findMatchedPIPByLoadBalancerIP: failed to listPIP force refresh: throttled due to too many requests
The Service is internal and uses:
service.beta.kubernetes.io/azure-load-balancer-internal: "true"
service.beta.kubernetes.io/azure-load-balancer-ipv4: "10.104.176.35"
During reconcile/cleanup, the ownership check can enter the external/public-PIP branch with this private IP and try to find a Public IP resource whose address is 10.104.176.35. That lookup is a guaranteed miss, and under ARM throttling it can amplify repeated listPIP / force-refresh calls.
Symptoms
- Repeated
serviceOwnsFrontendIP warnings for the same Service/IP.
- Errors alternate between:
cannot find public IP with IP address <private-ip>
failed to listPIP
failed to listPIP force refresh
- Reconcile latency increases while ARM is throttling.
- Logs become noisy around node churn and Service backend syncs.
Suspected code path
Relevant paths:
pkg/provider/azure_loadbalancer.go
reconcileService
reconcileLoadBalancer
getServiceLoadBalancerStatus
reconcileFrontendIPConfigs
isFrontendIPChanged
serviceOwnsFrontendIP
pkg/provider/azure_publicip_repo.go
findMatchedPIP
findMatchedPIPByLoadBalancerIP
listPIP
pkg/cache/azure_cache.go
Observed flow:
Service reconcile reaches frontend ownership checks
-> serviceOwnsFrontendIP sees loadBalancerIP="10.104.176.35"
-> external/public-PIP branch calls findMatchedPIP
-> listPIP(default)
-> on miss, listPIP(force refresh)
-> 429 or guaranteed not found
Why this is expensive
serviceOwnsFrontendIP is used like a cheap boolean predicate, but for external secondary Services it can trigger ARM-backed Public IP discovery.
One reconcile may call it multiple times:
- while scanning frontend IP configs for status
- while reconciling frontend IP configs
- while checking whether frontend IP changed
- while deriving owned frontend config IP version
When lookup fails, serviceOwnsFrontendIP logs and returns false. The error is not propagated as a reusable throttling/lookup state, so callers may continue scanning and re-enter the same PIP lookup.
findMatchedPIPByLoadBalancerIP force-refreshes the PIP cache on every miss. The cache does not memoize negative results or 429 failures, so repeated callers can repeatedly hit ARM.
Expected behavior
For a private loadBalancerIP, the public-PIP ownership path should avoid repeated Public IP list/force-refresh calls.
During ARM throttling, repeated ownership checks should avoid re-triggering identical PIP lookups within the same reconcile.
Proposed improvements
-
Short-circuit impossible public-PIP lookups.
If serviceOwnsFrontendIP is on the external/public-PIP branch and loadBalancerIP is private/RFC1918, skip findMatchedPIP during ownership scanning.
For real external create/update paths where a user specifies a private IP, return a clear validation error instead of repeatedly listing PIPs.
-
Add per-reconcile memoization for PIP lookup.
Memoize PIP lookup results by:
(resourceGroup, loadBalancerIP or pipName)
within one reconcile. Cache both success and failure/throttle state for the reconcile duration.
-
Stop flattening lookup errors into false.
Change ownership resolution to distinguish:
owned=false, err=nil // definitely not owned
owned=false, err=throttled // lookup unknown due to throttling
owned=false, err=notFound // lookup completed, no match
Then the top-level reconcile can log once and back off instead of continuing as if the frontend simply belongs to another Service.
-
Reduce NSG retain log amplification.
listAvailableSecurityGroupDestinations and RetainDestinationFromRules can log very large duplicated destination lists. De-duping retained destinations before retain/logging would reduce noise and CPU overhead, though it does not directly fix the PIP 429 loop.
Acceptance criteria
- Private
loadBalancerIP values do not trigger Public IP list/force-refresh calls from ownership scanning.
- Repeated
serviceOwnsFrontendIP checks in one reconcile do not repeat identical PIP list calls after a miss or 429.
- 429 from PIP list is propagated or memoized as an unknown/throttled lookup state, not silently converted to
false.
- Unit tests cover:
- public-PIP ownership scan with private
loadBalancerIP skips findMatchedPIP
- external Service with private pinned IP fails/skips early as designed
- repeated ownership checks reuse memoized PIP lookup result
- throttled PIP lookup is not retried multiple times in one reconcile
What happened
cloud-controller-managerrepeatedly logsserviceOwnsFrontendIPwarnings while reconciling an internal LoadBalancer Service with a private pinned IP:The Service is internal and uses:
During reconcile/cleanup, the ownership check can enter the external/public-PIP branch with this private IP and try to find a Public IP resource whose address is
10.104.176.35. That lookup is a guaranteed miss, and under ARM throttling it can amplify repeatedlistPIP/ force-refresh calls.Symptoms
serviceOwnsFrontendIPwarnings for the same Service/IP.cannot find public IP with IP address <private-ip>failed to listPIPfailed to listPIP force refreshSuspected code path
Relevant paths:
pkg/provider/azure_loadbalancer.goreconcileServicereconcileLoadBalancergetServiceLoadBalancerStatusreconcileFrontendIPConfigsisFrontendIPChangedserviceOwnsFrontendIPpkg/provider/azure_publicip_repo.gofindMatchedPIPfindMatchedPIPByLoadBalancerIPlistPIPpkg/cache/azure_cache.goTimedCache.getObserved flow:
Why this is expensive
serviceOwnsFrontendIPis used like a cheap boolean predicate, but for external secondary Services it can trigger ARM-backed Public IP discovery.One reconcile may call it multiple times:
When lookup fails,
serviceOwnsFrontendIPlogs and returnsfalse. The error is not propagated as a reusable throttling/lookup state, so callers may continue scanning and re-enter the same PIP lookup.findMatchedPIPByLoadBalancerIPforce-refreshes the PIP cache on every miss. The cache does not memoize negative results or 429 failures, so repeated callers can repeatedly hit ARM.Expected behavior
For a private
loadBalancerIP, the public-PIP ownership path should avoid repeated Public IP list/force-refresh calls.During ARM throttling, repeated ownership checks should avoid re-triggering identical PIP lookups within the same reconcile.
Proposed improvements
Short-circuit impossible public-PIP lookups.
If
serviceOwnsFrontendIPis on the external/public-PIP branch andloadBalancerIPis private/RFC1918, skipfindMatchedPIPduring ownership scanning.For real external create/update paths where a user specifies a private IP, return a clear validation error instead of repeatedly listing PIPs.
Add per-reconcile memoization for PIP lookup.
Memoize PIP lookup results by:
within one reconcile. Cache both success and failure/throttle state for the reconcile duration.
Stop flattening lookup errors into
false.Change ownership resolution to distinguish:
Then the top-level reconcile can log once and back off instead of continuing as if the frontend simply belongs to another Service.
Reduce NSG retain log amplification.
listAvailableSecurityGroupDestinationsandRetainDestinationFromRulescan log very large duplicated destination lists. De-duping retained destinations before retain/logging would reduce noise and CPU overhead, though it does not directly fix the PIP 429 loop.Acceptance criteria
loadBalancerIPvalues do not trigger Public IP list/force-refresh calls from ownership scanning.serviceOwnsFrontendIPchecks in one reconcile do not repeat identical PIP list calls after a miss or 429.false.loadBalancerIPskipsfindMatchedPIP