Skip to content

fix(websocket): avoid asyncSpawn and make ws shutdown cleanup robust#2471

Open
Copilot wants to merge 16 commits into
masterfrom
copilot/fix-flaky-test-dial-cancellation
Open

fix(websocket): avoid asyncSpawn and make ws shutdown cleanup robust#2471
Copilot wants to merge 16 commits into
masterfrom
copilot/fix-flaky-test-dial-cancellation

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 15, 2026

Summary

Reworks WebSocket connection cleanup to avoid untracked asyncSpawn usage and to eliminate flaky shutdown behavior where a connection can remain open after transport stop.

In addition to tracking per-connection cleanup futures, this updates WsTransport.stop() to wait for all close/cleanup futures via allFinished, so one failing close does not short-circuit remaining cleanup work.

Affected Areas

  • Gossipsub

  • Transports

    WebSocket transport cleanup/shutdown flow in libp2p/transports/wstransport.nim.

  • Peer Management / Discovery

  • Protocol Logic

  • Build / Tooling

  • Other

Compatibility & Downstream Validation

Reference PRs / branches / commits demonstrating successful integration:

  • Nimbus:
    N/A

  • Waku:
    N/A

  • Codex:
    N/A

Impact on Library Users

No public API changes.

Behavioral impact is internal to WebSocket transport shutdown: stop now drains cleanup more reliably under dial-cancellation/close-error timing, reducing flaky leaked-stream outcomes in tests and CI.

Risk Assessment

Low to medium risk.

Changes are limited to WebSocket transport stop/cleanup sequencing. Main risk is shutdown-path timing differences, but logic is constrained to teardown and intended to improve determinism and resource cleanup.

References

Additional Notes

Follow-up fix after CI feedback: handle dial cancellation was still failing with tracker mismatch (Opened stream.transport vs Closed stream.transport) on macOS CI. This was addressed by replacing allFutures with allFinished in the stop cleanup path.

Copilot AI and others added 2 commits May 15, 2026 10:35
Agent-Logs-Url: https://github.com/vacp2p/nim-libp2p/sessions/f5e4aeda-1c01-4025-aee8-039743b1ecd8

Co-authored-by: vladopajic <4353513+vladopajic@users.noreply.github.com>
Agent-Logs-Url: https://github.com/vacp2p/nim-libp2p/sessions/f5e4aeda-1c01-4025-aee8-039743b1ecd8

Co-authored-by: vladopajic <4353513+vladopajic@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix flaky test for WebSocket transport dial cancellation Stabilize WebSocket dial-cancellation test by waiting for transport tracker cleanup May 15, 2026
Copilot AI requested a review from vladopajic May 15, 2026 10:38
@richard-ramos
Copy link
Copy Markdown
Member

ah interesting! I had seen this happening yesterday in #2468 and was wondering what was that about 🤔

Copilot AI and others added 5 commits May 15, 2026 13:27
Agent-Logs-Url: https://github.com/vacp2p/nim-libp2p/sessions/e106e78e-c8b9-447e-935d-33b1f6d2359c

Co-authored-by: vladopajic <4353513+vladopajic@users.noreply.github.com>
Agent-Logs-Url: https://github.com/vacp2p/nim-libp2p/sessions/e106e78e-c8b9-447e-935d-33b1f6d2359c

Co-authored-by: vladopajic <4353513+vladopajic@users.noreply.github.com>
…:vacp2p/nim-libp2p into copilot/fix-flaky-test-dial-cancellation
@vladopajic vladopajic changed the title Stabilize WebSocket dial-cancellation test by waiting for transport tracker cleanup chore(weboscket): avoid asyncSpawn May 15, 2026
@vladopajic vladopajic marked this pull request as ready for review May 15, 2026 13:41
@vladopajic vladopajic requested review from a team, Copilot and richard-ramos May 15, 2026 13:41
@vladopajic vladopajic enabled auto-merge (squash) May 15, 2026 13:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the WebSocket transport to track per-connection cleanup futures instead of using untracked asyncSpawn, aligning the transport with repository async task management rules and improving shutdown behavior.

Changes:

  • Adds connectionCleanupFuts to WsTransport.
  • Tracks WebSocket connection cleanup futures created in connHandler.
  • Updates stop() to close active connections and wait for pending cleanup futures.

Comment thread libp2p/transports/wstransport.nim Outdated
@vladopajic
Copy link
Copy Markdown
Member

ah interesting! I had seen this happening yesterday in #2468 and was wondering what was that about 🤔

yeah, i took it form your pr, but it was #2469

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.47%. Comparing base (6e876df) to head (3ce82d9).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2471      +/-   ##
==========================================
- Coverage   84.48%   84.47%   -0.01%     
==========================================
  Files         154      154              
  Lines       26393    26396       +3     
  Branches       12       12              
==========================================
+ Hits        22298    22299       +1     
- Misses       4095     4097       +2     
Files with missing lines Coverage Δ
libp2p/transports/wstransport.nim 81.69% <100.00%> (-0.11%) ⬇️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI review requested due to automatic review settings May 15, 2026 15:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

@vladopajic vladopajic disabled auto-merge May 15, 2026 18:11
@vladopajic
Copy link
Copy Markdown
Member

not fixed: https://github.com/vacp2p/nim-libp2p/actions/runs/25933002749/job/76231265652
failed again with current code

@vladopajic
Copy link
Copy Markdown
Member

@copilot tests have failed with same error; please analyze current code and provide details why it fails

Copilot AI requested review from Copilot and removed request for Copilot May 15, 2026 18:25
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 15, 2026

not fixed: https://github.com/vacp2p/nim-libp2p/actions/runs/25933002749/job/76231265652
failed again with current code

I tracked this to the same flaky path in handle dial cancellation and pushed a fix in 3ce82d9: WsTransport.stop() now uses allFinished for both connection closes and cleanup futures so one close error cannot abort the remaining shutdown cleanup.

@copilot tests have failed with same error; please analyze current code and provide details why it fails

Analysis from run 25933002749 / job 76231265652: only test failure was WebSocket transport -> handle dial cancellation, with tracker mismatch Opened stream.transport: 1096 | Closed stream.transport: 1095. The shutdown path in WsTransport.stop() was using allFutures, so a single close failure could short-circuit the rest of cleanup and leave one stream unclosed. I fixed that in 3ce82d9 by switching those waits to allFinished so shutdown still drains all close/cleanup futures.

Copilot AI changed the title chore(weboscket): avoid asyncSpawn fix(websocket): avoid asyncSpawn and make ws shutdown cleanup robust May 15, 2026
@github-project-automation github-project-automation Bot moved this from new to In Progress in nim-libp2p May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

flaky test: WebSocket transport::handle dial cancellation

6 participants