backend: k8cache: Fix runWatcher goroutines leak when clusters are removed#5585
backend: k8cache: Fix runWatcher goroutines leak when clusters are removed#5585Joshna907 wants to merge 2 commits into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Joshna907 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
8cb1298 to
76309bf
Compare
illume
left a comment
There was a problem hiding this comment.
Thanks for this PR.
A few of the commits don't quite follow the project guidelines. We use Linux kernel style for git commits — have a look at the contributing guide and previous commits with git log.
Commits that need attention
fix(backend): clean up watcher goroutines when clusters are removed— Missingarea: descriptionprefix — e.g.frontend: HomeButton: Fix so it navigates to homeorbackend: config: Add enable-dynamic-clusters flag.
Commit guidelines
- Use atomic commits focused on a single change.
- Use the title format
<area>: <Description of changes>— description must start with a capital letter. - Keep the title under 72 characters (soft requirement).
- Explain the intention and why the change is needed.
- Make commit titles meaningful and describe what changed.
- Do not add code that a later commit rewrites; squash or reorder commits instead.
- Do not include
Fixes #NNin commit messages.
Good examples:
frontend: HomeButton: Fix so it navigates to homebackend: config: Add enable-dynamic-clusters flag
There was a problem hiding this comment.
Pull request overview
This PR aims to prevent backend runWatcher goroutine/network-connection leaks by canceling watcher contexts when clusters (contexts) are removed from the active kubeconfig configuration, using a new ContextStore change-listener hook to trigger a watcher sync.
Changes:
- Added a listener mechanism to
ContextStoreto notify on context add/remove/set-with-ttl. - Introduced
k8cache.SyncWatchers(activeContexts)to cancel watchers whose context keys are no longer active. - Wired
SyncWatchersto run on ContextStore changes during backend initialization.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| backend/pkg/kubeconfig/contextStore.go | Adds listener registration + async notifications on context mutations. |
| backend/pkg/k8cache/cacheInvalidation.go | Adds SyncWatchers to cancel orphaned watcher contexts. |
| backend/cmd/server.go | Registers a ContextStore listener to call SyncWatchers when contexts change. |
86c2e0f to
1537a66
Compare
1537a66 to
9a1b8d1
Compare
cdacc74 to
d517f57
Compare
|
ptal now @illume now mostly, copilot will open nitpicking issues |
d517f57 to
82b28fb
Compare
82b28fb to
701f085
Compare
|
ptal @illume |
|
ptal @illume |
d7bdea9 to
79a9f07
Compare
|
ptal @illume |
1a04e99 to
b4ee6df
Compare
|
ptal @illume |
b4ee6df to
b58d7da
Compare
|
ptal @illume |
b58d7da to
dd1ba68
Compare
…moved
This commit addresses a critical resource leak in the backend where
runWatcher goroutines responsible for watching Kubernetes events were
never stopped when clusters were removed from the configuration.
By implementing an observable listener pattern in the ContextStore,
the backend now triggers a synchronization function to cancel invalid
watcher contexts, preventing indefinite goroutine persistence and
network connection leaks.
Key Changes:
1. ContextStore Change Notifications:
- Added listener registration (AddListener) and change notifications
to kubeconfig.ContextStore.
- Triggered notification upon context addition, removal, and TTL eviction.
- Handled nil listeners safely during registration.
2. Watcher Synchronization:
- Added SyncWatchers function to backend/pkg/k8cache/cacheInvalidation.go
which compares active contexts against current watchers and invokes the
corresponding cancel() function for removed contexts.
- Wired SyncWatchers to run automatically when the ContextStore changes.
- Redacted user-specific details from logged messages when canceling
watchers to protect sensitive/PII data.
3. Robust Cache Cleanup & Panic Protection:
- Wrapped TTL cache eviction callbacks with recover in backend/pkg/cache/cache.go
to protect the janitor goroutine from unexpected callback panics.
4. Test Coverage:
- Added comprehensive unit tests in pkg/kubeconfig/contextStore_test.go
to verify event listener triggers and TTL eviction notifications.
- Added panic recovery unit tests in pkg/cache/cache_internal_test.go.
- Fixed potential data race in TestCacheEvictionPanicRecovery by
replacing the un-synchronized panicked boolean flag with a clean
channel-based synchronization mechanism.
dd1ba68 to
d384bb3
Compare
|
ptal @illume |
illume
left a comment
There was a problem hiding this comment.
Thanks for working on this.
Would you mind addressing the open Copilot review comments? Please mark each comment as resolved after addressing it.
|
ptal @illume |
Summary: This PR addresses a critical resource leak in the backend where runWatcher goroutines responsible for watching Kubernetes events were never stopped when clusters were removed from the configuration. By implementing an observable listener pattern in the ContextStore, the backend now accurately triggers a synchronization function to cancel invalid watcher contexts, preventing indefinite goroutine persistence and network connection leaks.
Fixes: #5415
Changes:
backend/pkg/k8cache/cacheInvalidation.go:Added a SyncWatchers function that compares the active cluster list against the contextCancel registry.
Implemented logic to selectively invoke cancel() functions for context keys that are no longer present in the active configuration.
backend/pkg/kubeconfig/contextStore.go:Implemented an observer pattern (AddListener and notifyListeners) for the ContextStore to emit events upon configuration changes.
Updated AddContext, RemoveContext, and AddContextWithKeyAndTTL to safely notify listeners using a mutex without introducing circular dependencies.
backend/cmd/server.go:Wired k8cache.SyncWatchers to the kubeConfigStore listener during initialization, ensuring the cleanup of orphaned goroutines is triggered exactly when cluster configurations change.
backend/pkg/k8cache/authorization_test.go:Removed unused sync and sync/atomic imports to maintain clean builds and test parity.
Steps to Test:
Validation Results: