multiplexer: Improve WebSocket stability#5343
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: PrakharJain345 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
Improves session robustness for the Kubernetes WebSocket multiplexer (so one failing/malformed stream doesn’t drop all watches) and adjusts the Terminal component to more reliably fall back to alternate shells when initialization fails.
Changes:
- Backend: differentiate fatal vs non-fatal client WS read errors; avoid failing on non-JSON payloads when checking resourceVersion.
- Backend tests: update
readClientMessagetests for the new(msg, isFatal, err)signature. - Frontend: treat early
ServerErrorchannel output as an initialization failure to trigger shell fallback.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
frontend/src/components/common/Terminal.tsx |
Adjusts terminal init failure detection to trigger shell fallback earlier. |
backend/cmd/multiplexer.go |
Refactors the client WS loop to isolate non-fatal errors; makes resourceVersion detection tolerant of non-JSON messages. |
backend/cmd/multiplexer_test.go |
Updates tests for the new readClientMessage return signature. |
illume
left a comment
There was a problem hiding this comment.
Thanks for the contribution.
There are some open Copilot review comments — could you take a look at them?
illume
left a comment
There was a problem hiding this comment.
Thanks for this PR.
The CI backend test job has failures. Running cd backend && go test ./... locally should show what's going wrong.
How to run the backend tests
Run cd backend && go test ./... to see all failures. Fix the failing tests and commit the result.
28a3b11 to
0744668
Compare
illume
left a comment
There was a problem hiding this comment.
Thanks for these changes.
Can you please address the open review comments?
0744668 to
1308e49
Compare
illume
left a comment
There was a problem hiding this comment.
Thanks for these changes.
Can you please have a look at the git commits to see if they meet the contribution guidelines? We use a Linux kernel style of git commits. See the contributing guide for general context, and please see previous git commits with git log for examples.
Commits that need attention
multiplexer: improve WebSocket stability and terminal shell fallback— Description must start with a capital letter — e.g.frontend: HomeButton: Fix the buttonnotfrontend: HomeButton: fix the button.
Commit guidelines
- Use atomic commits focused on a single change.
- Use the title format
<area>: <Description of changes>— description must start with a capital letter. - Keep the title under 72 characters (soft requirement).
- Explain the intention and why the change is needed.
- Make commit titles meaningful and describe what changed.
- Do not add code that a later commit rewrites; squash or reorder commits instead.
- Do not include
Fixes #NNin commit messages.
Good examples:
frontend: HomeButton: Fix so it navigates to homebackend: config: Add enable-dynamic-clusters flag
illume
left a comment
There was a problem hiding this comment.
Thanks for these changes.
Can you please address the open review comments? Once you've resolved each one, please mark it as resolved.
3ca8cba to
fa23420
Compare
illume
left a comment
There was a problem hiding this comment.
Thanks for the contribution.
it looks like there's a merge-main commit in this PR — could you rebase onto main instead?
Why this matters
Merge commits from main make the PR history harder to review. Please rebase your branch on top of the latest main instead, then update the PR with the rebased commits.
ebbb423 to
e13183f
Compare
e13183f to
14c1cb4
Compare
illume
left a comment
There was a problem hiding this comment.
Thanks for the contribution.
it looks like there's a merge-main commit in this PR — could you rebase onto main instead?
Why this matters
Merge commits from main make the PR history harder to review. Please rebase your branch on top of the latest main instead, then update the PR with the rebased commits.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
backend/cmd/multiplexer.go:297
- conn.closed is written under c.mu here, but there are unsynchronized reads elsewhere (e.g., reconnect() checks conn.closed without taking conn.mu). That creates a potential data race once safeClose is called concurrently with monitor/reconnect logic. Suggest making
closedan atomic (atomic.Bool) or enforcing that all reads/writes go through the same mutex-protected accessor.
c.mu.Lock()
c.closed = true
c.mu.Unlock()
* Guaranteed final status propagation to client during connection teardowns * Streamlined error routing payloads enabling decoupled client channel parsing * Safeguarded concurrent write locks ensuring isolated backend delivery * Re-aligned verification suites matching modular error response standards * Purged customized ping handler interceptors restoring native low-latency framing * Upgraded deferred cleanup closures using synchronized lock containers for concurrency * Hardened request telemetry handlers globally utilizing bulletproof nil-receiver guards
illume
left a comment
There was a problem hiding this comment.
Thanks for the contribution.
There are some open Copilot review comments — could you take a look at them? Please mark each one as resolved once you've addressed it.
Summary
Fixes #5224 by making the WebSocket multiplexer tolerate per-request failures without dropping the whole client session. Malformed client messages, token lookup failures, and cluster dial errors are isolated to the affected request/cluster while the multiplexer keeps serving other active watches.
Changes
ERRORframes for token and cluster connection failures.ERRORframes throughuseWebSocket'sonErrorcallback instead of fabricating Kubernetes watch update events.safeClose.Steps to Test
Local Validation
go test ./cmd- passednpm test -- src/lib/k8s/api/v2/multiplexer.test.ts --run- passednpm run frontend:tsc- passednpm run frontend:lint- passednpm run backend:test- passedScreenshots
N/A - backend/frontend protocol stability fix.
Notes for the Reviewer
Rebased onto latest
mainand squashed the PR to one guideline-compliant commit:multiplexer: Improve WebSocket stability.