fix(websocket): avoid asyncSpawn and make ws shutdown cleanup robust#2471
fix(websocket): avoid asyncSpawn and make ws shutdown cleanup robust#2471Copilot wants to merge 16 commits into
Conversation
Agent-Logs-Url: https://github.com/vacp2p/nim-libp2p/sessions/f5e4aeda-1c01-4025-aee8-039743b1ecd8 Co-authored-by: vladopajic <4353513+vladopajic@users.noreply.github.com>
Agent-Logs-Url: https://github.com/vacp2p/nim-libp2p/sessions/f5e4aeda-1c01-4025-aee8-039743b1ecd8 Co-authored-by: vladopajic <4353513+vladopajic@users.noreply.github.com>
|
ah interesting! I had seen this happening yesterday in #2468 and was wondering what was that about 🤔 |
Agent-Logs-Url: https://github.com/vacp2p/nim-libp2p/sessions/e106e78e-c8b9-447e-935d-33b1f6d2359c Co-authored-by: vladopajic <4353513+vladopajic@users.noreply.github.com>
Agent-Logs-Url: https://github.com/vacp2p/nim-libp2p/sessions/e106e78e-c8b9-447e-935d-33b1f6d2359c Co-authored-by: vladopajic <4353513+vladopajic@users.noreply.github.com>
Agent-Logs-Url: https://github.com/vacp2p/nim-libp2p/sessions/e106e78e-c8b9-447e-935d-33b1f6d2359c Co-authored-by: vladopajic <4353513+vladopajic@users.noreply.github.com>
Agent-Logs-Url: https://github.com/vacp2p/nim-libp2p/sessions/e106e78e-c8b9-447e-935d-33b1f6d2359c Co-authored-by: vladopajic <4353513+vladopajic@users.noreply.github.com>
Agent-Logs-Url: https://github.com/vacp2p/nim-libp2p/sessions/e106e78e-c8b9-447e-935d-33b1f6d2359c Co-authored-by: vladopajic <4353513+vladopajic@users.noreply.github.com>
…:vacp2p/nim-libp2p into copilot/fix-flaky-test-dial-cancellation
There was a problem hiding this comment.
Pull request overview
This PR updates the WebSocket transport to track per-connection cleanup futures instead of using untracked asyncSpawn, aligning the transport with repository async task management rules and improving shutdown behavior.
Changes:
- Adds
connectionCleanupFutstoWsTransport. - Tracks WebSocket connection cleanup futures created in
connHandler. - Updates
stop()to close active connections and wait for pending cleanup futures.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2471 +/- ##
==========================================
- Coverage 84.48% 84.47% -0.01%
==========================================
Files 154 154
Lines 26393 26396 +3
Branches 12 12
==========================================
+ Hits 22298 22299 +1
- Misses 4095 4097 +2
🚀 New features to boost your workflow:
|
|
not fixed: https://github.com/vacp2p/nim-libp2p/actions/runs/25933002749/job/76231265652 |
|
@copilot tests have failed with same error; please analyze current code and provide details why it fails |
Agent-Logs-Url: https://github.com/vacp2p/nim-libp2p/sessions/e35a77e2-d18b-40c8-8121-68e61898c16d Co-authored-by: vladopajic <4353513+vladopajic@users.noreply.github.com>
I tracked this to the same flaky path in
Analysis from run |
Summary
Reworks WebSocket connection cleanup to avoid untracked
asyncSpawnusage and to eliminate flaky shutdown behavior where a connection can remain open after transport stop.In addition to tracking per-connection cleanup futures, this updates
WsTransport.stop()to wait for all close/cleanup futures viaallFinished, so one failing close does not short-circuit remaining cleanup work.Affected Areas
Gossipsub
Transports
WebSocket transport cleanup/shutdown flow in
libp2p/transports/wstransport.nim.Peer Management / Discovery
Protocol Logic
Build / Tooling
Other
Compatibility & Downstream Validation
Reference PRs / branches / commits demonstrating successful integration:
Nimbus:
N/A
Waku:
N/A
Codex:
N/A
Impact on Library Users
No public API changes.
Behavioral impact is internal to WebSocket transport shutdown: stop now drains cleanup more reliably under dial-cancellation/close-error timing, reducing flaky leaked-stream outcomes in tests and CI.
Risk Assessment
Low to medium risk.
Changes are limited to WebSocket transport stop/cleanup sequencing. Main risk is shutdown-path timing differences, but logic is constrained to teardown and intended to improve determinism and resource cleanup.
References
Additional Notes
Follow-up fix after CI feedback:
handle dial cancellationwas still failing with tracker mismatch (Opened stream.transportvsClosed stream.transport) on macOS CI. This was addressed by replacingallFutureswithallFinishedin the stop cleanup path.