Skip to content

multiplexer: Improve WebSocket stability#5343

Open
PrakharJain345 wants to merge 9 commits into
kubernetes-sigs:mainfrom
PrakharJain345:fix/multiplexer-stability-5224
Open

multiplexer: Improve WebSocket stability#5343
PrakharJain345 wants to merge 9 commits into
kubernetes-sigs:mainfrom
PrakharJain345:fix/multiplexer-stability-5224

Conversation

@PrakharJain345
Copy link
Copy Markdown
Contributor

@PrakharJain345 PrakharJain345 commented May 2, 2026

Summary

Fixes #5224 by making the WebSocket multiplexer tolerate per-request failures without dropping the whole client session. Malformed client messages, token lookup failures, and cluster dial errors are isolated to the affected request/cluster while the multiplexer keeps serving other active watches.

Changes

  • Refactor the client read loop to distinguish fatal socket read errors from non-fatal message parsing errors.
  • Send routable backend ERROR frames for token and cluster connection failures.
  • Route frontend multiplexer ERROR frames through useWebSocket's onError callback instead of fabricating Kubernetes watch update events.
  • Avoid resourceVersion parsing failures for non-watch/non-JSON cluster messages from tearing down the session.
  • Synchronize connection close-state reads so reconnect checks do not race with safeClose.
  • Add backend and frontend regression coverage for multiplexer error handling.

Steps to Test

  1. Open Headlamp with multiple resource watches active.
  2. Trigger a malformed multiplexer request or a token/cluster connection error for one subscription.
  3. Verify the affected subscription receives an error while the multiplexer connection and other watches remain active.

Local Validation

  • go test ./cmd - passed
  • npm test -- src/lib/k8s/api/v2/multiplexer.test.ts --run - passed
  • npm run frontend:tsc - passed
  • npm run frontend:lint - passed
  • npm run backend:test - passed

Screenshots

N/A - backend/frontend protocol stability fix.

Notes for the Reviewer

Rebased onto latest main and squashed the PR to one guideline-compliant commit: multiplexer: Improve WebSocket stability.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 2, 2026
@k8s-ci-robot k8s-ci-robot requested review from sniok and vyncent-t May 2, 2026 15:23
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: PrakharJain345
Once this PR has been reviewed and has the lgtm label, please assign sniok for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 2, 2026
@illume illume requested a review from Copilot May 2, 2026 17:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves session robustness for the Kubernetes WebSocket multiplexer (so one failing/malformed stream doesn’t drop all watches) and adjusts the Terminal component to more reliably fall back to alternate shells when initialization fails.

Changes:

  • Backend: differentiate fatal vs non-fatal client WS read errors; avoid failing on non-JSON payloads when checking resourceVersion.
  • Backend tests: update readClientMessage tests for the new (msg, isFatal, err) signature.
  • Frontend: treat early ServerError channel output as an initialization failure to trigger shell fallback.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
frontend/src/components/common/Terminal.tsx Adjusts terminal init failure detection to trigger shell fallback earlier.
backend/cmd/multiplexer.go Refactors the client WS loop to isolate non-fatal errors; makes resourceVersion detection tolerant of non-JSON messages.
backend/cmd/multiplexer_test.go Updates tests for the new readClientMessage return signature.

Comment thread frontend/src/components/common/Terminal.tsx Outdated
Comment thread backend/cmd/multiplexer.go
Comment thread backend/cmd/multiplexer.go Outdated
Comment thread backend/cmd/multiplexer.go
Copy link
Copy Markdown
Contributor

@illume illume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution.

There are some open Copilot review comments — could you take a look at them?

Copy link
Copy Markdown
Contributor

@illume illume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR.

The CI backend test job has failures. Running cd backend && go test ./... locally should show what's going wrong.

How to run the backend tests

Run cd backend && go test ./... to see all failures. Fix the failing tests and commit the result.

@PrakharJain345 PrakharJain345 force-pushed the fix/multiplexer-stability-5224 branch from 28a3b11 to 0744668 Compare May 2, 2026 18:41
@PrakharJain345 PrakharJain345 requested review from illume May 2, 2026 19:03
@illume illume requested a review from Copilot May 2, 2026 21:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Comment thread backend/cmd/multiplexer_test.go
Comment thread backend/cmd/multiplexer.go
Comment thread backend/cmd/multiplexer.go
Comment thread backend/cmd/multiplexer.go Outdated
Copy link
Copy Markdown
Contributor

@illume illume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these changes.

Can you please address the open review comments?

@illume illume marked this pull request as draft May 3, 2026 06:15
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 3, 2026
@PrakharJain345 PrakharJain345 force-pushed the fix/multiplexer-stability-5224 branch from 0744668 to 1308e49 Compare May 4, 2026 05:11
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 4, 2026
@PrakharJain345 PrakharJain345 marked this pull request as ready for review May 4, 2026 05:12
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 4, 2026
@k8s-ci-robot k8s-ci-robot requested review from kahirokunn and skoeva May 4, 2026 05:12
@illume illume requested a review from Copilot May 4, 2026 08:24
Copy link
Copy Markdown
Contributor

@illume illume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these changes.

Can you please have a look at the git commits to see if they meet the contribution guidelines? We use a Linux kernel style of git commits. See the contributing guide for general context, and please see previous git commits with git log for examples.

Commits that need attention
  • multiplexer: improve WebSocket stability and terminal shell fallback — Description must start with a capital letter — e.g. frontend: HomeButton: Fix the button not frontend: HomeButton: fix the button.
Commit guidelines
  • Use atomic commits focused on a single change.
  • Use the title format <area>: <Description of changes> — description must start with a capital letter.
  • Keep the title under 72 characters (soft requirement).
  • Explain the intention and why the change is needed.
  • Make commit titles meaningful and describe what changed.
  • Do not add code that a later commit rewrites; squash or reorder commits instead.
  • Do not include Fixes #NN in commit messages.

Good examples:

  • frontend: HomeButton: Fix so it navigates to home
  • backend: config: Add enable-dynamic-clusters flag

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Comment thread backend/pkg/spa/embeddedSPAHandler.go Outdated
Comment thread backend/cmd/headlamp.go Outdated
Comment thread backend/pkg/telemetry/requesthandler.go
Comment thread backend/pkg/plugins/plugins.go
Comment thread backend/cmd/multiplexer.go Outdated
Comment thread backend/pkg/spa/spaHandler.go Outdated
Copy link
Copy Markdown
Contributor

@illume illume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these changes.

Can you please address the open review comments? Once you've resolved each one, please mark it as resolved.

@PrakharJain345 PrakharJain345 force-pushed the fix/multiplexer-stability-5224 branch from 3ca8cba to fa23420 Compare May 16, 2026 15:02
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 16, 2026
@PrakharJain345 PrakharJain345 requested a review from illume May 16, 2026 15:07
@illume illume requested a review from Copilot May 17, 2026 06:42
Copy link
Copy Markdown
Contributor

@illume illume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution.

it looks like there's a merge-main commit in this PR — could you rebase onto main instead?

Why this matters

Merge commits from main make the PR history harder to review. Please rebase your branch on top of the latest main instead, then update the PR with the rebased commits.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comment thread frontend/src/lib/k8s/api/v2/multiplexer.ts
Comment thread backend/cmd/multiplexer.go
Comment thread backend/cmd/multiplexer.go
@PrakharJain345 PrakharJain345 force-pushed the fix/multiplexer-stability-5224 branch from ebbb423 to e13183f Compare May 17, 2026 08:49
@PrakharJain345 PrakharJain345 changed the title multiplexer: improve WebSocket stability and terminal shell fallback multiplexer: Improve WebSocket stability May 17, 2026
@PrakharJain345 PrakharJain345 requested a review from illume May 17, 2026 08:52
@illume illume requested a review from Copilot May 17, 2026 10:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comment thread frontend/src/lib/k8s/api/v2/multiplexer.ts
Comment thread backend/cmd/multiplexer.go
Comment thread backend/cmd/multiplexer.go
@PrakharJain345 PrakharJain345 force-pushed the fix/multiplexer-stability-5224 branch from e13183f to 14c1cb4 Compare May 17, 2026 16:39
@k8s-ci-robot k8s-ci-robot removed the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 17, 2026
Copy link
Copy Markdown
Contributor

@illume illume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution.

it looks like there's a merge-main commit in this PR — could you rebase onto main instead?

Why this matters

Merge commits from main make the PR history harder to review. Please rebase your branch on top of the latest main instead, then update the PR with the rebased commits.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

backend/cmd/multiplexer.go:297

  • conn.closed is written under c.mu here, but there are unsynchronized reads elsewhere (e.g., reconnect() checks conn.closed without taking conn.mu). That creates a potential data race once safeClose is called concurrently with monitor/reconnect logic. Suggest making closed an atomic (atomic.Bool) or enforcing that all reads/writes go through the same mutex-protected accessor.
		c.mu.Lock()
		c.closed = true
		c.mu.Unlock()

Comment thread backend/cmd/multiplexer.go
* Guaranteed final status propagation to client during connection teardowns
* Streamlined error routing payloads enabling decoupled client channel parsing
* Safeguarded concurrent write locks ensuring isolated backend delivery
* Re-aligned verification suites matching modular error response standards
* Purged customized ping handler interceptors restoring native low-latency framing
* Upgraded deferred cleanup closures using synchronized lock containers for concurrency
* Hardened request telemetry handlers globally utilizing bulletproof nil-receiver guards
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comment thread backend/cmd/multiplexer.go
Comment thread backend/cmd/multiplexer.go Outdated
Comment thread backend/cmd/multiplexer.go
Copy link
Copy Markdown
Contributor

@illume illume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution.

There are some open Copilot review comments — could you take a look at them? Please mark each one as resolved once you've addressed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multiplexer drops cluster WebSocket when exec fails due to missing shell (bash not found)

4 participants