Skip to content

loadBalancerBackendPoolUpdater silently discards backend pool update when EndpointSlice informer cache is stale at startup — no retry, backend pool stays empty permanently #10252

@ManasiAatave

Description

@ManasiAatave

What happened:

When a LoadBalancer Service with ExternalTrafficPolicy=Local is created in a new namespace, the loadBalancerBackendPoolUpdater fires its first process() cycle before the EndpointSlice informer cache is populated. The updater finds 0 ready endpoints, skips EnsureHostsInPool,
and — critically — clears the entire operation queue with no retry. The Azure LB backend pool remains permanently empty and traffic never reaches the pod.

The failure is silent: no error is logged, no event is emitted, and no subsequent reconciliation is triggered.

Root cause (code pointers):

pkg/provider/azure_local_services.go:

  1. AddFunc (EndpointSlice informer) — only stores in cache, does NOT enqueue a backend pool operation. If process() has already run by the time the EndpointSlice is added, there is no re-trigger.
  2. UpdateFunc (EndpointSlice informer) — calls getLocalServiceInfo(key). If the service is not yet registered (timing race), it returns early with return — no re-queue.
  3. process() — iterates operations, hits continue for skipped ones (0 ready endpoints), then immediately executes: updater.operations = make([]batchOperation, 0)

This clears ALL queued operations — including the skipped ones — with no retry scheduling.

There is no periodic full-reconciliation sweep in the updater. It is purely event-driven, so once the queue is cleared there is no recovery path.

Workaround: Annotate the Service to force a re-reconciliation:

kubectl annotate svc -n
force-reconcile="$(date +%s)" --overwrite

What you expected to happen:

Skipped operations (due to empty/stale cache) should be re-queued for a future process() cycle, or the updater should trigger a re-sync when new EndpointSlice events arrive for a previously-skipped service.

 Related: #9839 (covers the concurrent-write race between the updater and main reconcile path — a separate but related issue in the same code)

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions