What happened:
When a LoadBalancer Service with ExternalTrafficPolicy=Local is created in a new namespace, the loadBalancerBackendPoolUpdater fires its first process() cycle before the EndpointSlice informer cache is populated. The updater finds 0 ready endpoints, skips EnsureHostsInPool,
and — critically — clears the entire operation queue with no retry. The Azure LB backend pool remains permanently empty and traffic never reaches the pod.
The failure is silent: no error is logged, no event is emitted, and no subsequent reconciliation is triggered.
Root cause (code pointers):
pkg/provider/azure_local_services.go:
- AddFunc (EndpointSlice informer) — only stores in cache, does NOT enqueue a backend pool operation. If process() has already run by the time the EndpointSlice is added, there is no re-trigger.
- UpdateFunc (EndpointSlice informer) — calls getLocalServiceInfo(key). If the service is not yet registered (timing race), it returns early with return — no re-queue.
- process() — iterates operations, hits continue for skipped ones (0 ready endpoints), then immediately executes: updater.operations = make([]batchOperation, 0)
This clears ALL queued operations — including the skipped ones — with no retry scheduling.
There is no periodic full-reconciliation sweep in the updater. It is purely event-driven, so once the queue is cleared there is no recovery path.
Workaround: Annotate the Service to force a re-reconciliation:
kubectl annotate svc -n
force-reconcile="$(date +%s)" --overwrite
What you expected to happen:
Skipped operations (due to empty/stale cache) should be re-queued for a future process() cycle, or the updater should trigger a re-sync when new EndpointSlice events arrive for a previously-skipped service.
Related: #9839 (covers the concurrent-write race between the updater and main reconcile path — a separate but related issue in the same code)
What happened:
When a LoadBalancer Service with ExternalTrafficPolicy=Local is created in a new namespace, the loadBalancerBackendPoolUpdater fires its first process() cycle before the EndpointSlice informer cache is populated. The updater finds 0 ready endpoints, skips EnsureHostsInPool,
and — critically — clears the entire operation queue with no retry. The Azure LB backend pool remains permanently empty and traffic never reaches the pod.
The failure is silent: no error is logged, no event is emitted, and no subsequent reconciliation is triggered.
Root cause (code pointers):
pkg/provider/azure_local_services.go:
This clears ALL queued operations — including the skipped ones — with no retry scheduling.
There is no periodic full-reconciliation sweep in the updater. It is purely event-driven, so once the queue is cleared there is no recovery path.
Workaround: Annotate the Service to force a re-reconciliation:
kubectl annotate svc -n
force-reconcile="$(date +%s)" --overwrite
What you expected to happen:
Skipped operations (due to empty/stale cache) should be re-queued for a future process() cycle, or the updater should trigger a re-sync when new EndpointSlice events arrive for a previously-skipped service.
Related: #9839 (covers the concurrent-write race between the updater and main reconcile path — a separate but related issue in the same code)