perf: avoid throw-as-control-flow in SessionManager hot path (#7756)#7775
perf: avoid throw-as-control-flow in SessionManager hot path (#7756)#7775JohnMcLear wants to merge 1 commit into
Conversation
CPU profile of the SUT at the 100-400 author dive sweep
(load-test workflow run 25956384097) attributed about 6% of total
process CPU to the throw + catch around getSessionInfo:
- ~1.82% to `new CustomError('sessionID does not exist', 'apierror')`
construction (stack trace capture)
- ~4.12% downstream, via the catch block's `console.debug(...)`
routed through log4js -> sendToListeners -> sendLogEventToAppender
Both call sites (`findAuthorID` on every CLIENT_READY, and
`listSessionsWithDBKey` on session listing) immediately caught
`apierror` and discarded it. The public `exports.getSessionInfo`
contract still has to throw for the HTTP API (returning code:1 for
missing sessionID), so introduce a private `getSessionInfoOrNull`
helper that returns null and have the hot-path callers use it
directly. `exports.getSessionInfo` is kept as a thin wrapper that
preserves the existing throw semantics.
No behaviour change for the HTTP API — sessionsAndGroups.ts test
file (32 cases, including "getSessionInfo of deleted session"
expecting code:1) passes unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
Review Summary by QodoAvoid throw-as-control-flow in SessionManager hot path
WalkthroughsDescription• Eliminate throw-as-control-flow in hot-path callers of getSessionInfo • Introduce private getSessionInfoOrNull helper returning null • Refactor findAuthorID and listSessionsWithDBKey to use null checks • Reduce CPU overhead by ~6% (1.8% CustomError construction + 4% logging) Diagramflowchart LR
A["getSessionInfoOrNull<br/>returns null"] --> B["findAuthorID<br/>null check"]
A --> C["listSessionsWithDBKey<br/>null check"]
D["exports.getSessionInfo<br/>wrapper"] --> E["throws apierror<br/>for HTTP API"]
A --> D
File Changes1. src/node/db/SessionManager.ts
|
CPU profile of develop at the 100-400 author dive sweep (load-test run 25956384097) identified a ~6% process-CPU win in SessionManager: throw-as-control-flow on every CLIENT_READY session lookup. Add lever 9 section with the profile evidence, link the open PR (#7775), and add a "Other CPU hotspots surfaced" subsection documenting findings not yet acted on (Changeset internals, appendRevision, ueberdb/dirty backing as test-harness artifact, esbuild __name overhead). Update Recommendation to include #7775 as the highest-priority merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the "score pending" placeholder under lever 9 with the actual numbers from runs 25957107195/25957108328/25957109418 (perf branch) vs 25954537767/25954538807/25954540108 (develop), both at authors=100..500:step=50:dwell=8s:warmup=2s. Result: consistent -1.4% to -5.3% CPU reduction across all 9 steps, matching profile direction at 2-5% (vs 6% profile-attributed upper bound). Latency delta sits inside the noise envelope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
N=3 measured impactThree perf-branch runs (25957107195, 25957108328, 25957109418) compared against three develop baselines (25954537767, 25954538807, 25954540108) — same Medians:
ΔCPU% is consistently negative (-1.4% to -5.3%) across all 9 steps. The realised magnitude is below the profile-attributed 6% upper bound because some of the log4js cost the profile attributed to the throw path was unrelated info logging — but the direction is exactly what the profile predicted. Latency delta sits inside the noise envelope across all steps (raw p95 triples overlap heavily run-to-run). |
Three combined-branch runs (perf/dive-combined = #7776 cherry-picked onto #7775 base; runs 25960003164/25960004223/25960005248) vs the same three develop baselines: -12% to -20% CPU% across all 9 sweep steps, with the p95 cliff effectively moving from ~400 to ~500 authors (at step 400, two of three combined runs land below the cliff at 45ms and 112ms p95 vs develop [1758, 2275, 2463]). Adds: - Lever 10 section for #7776 with its own N=3 numbers (-3.6 to -8.9% alone). - "Stacking" section showing super-additive interaction. - Local vCPU experiment showing the cliff is single-event-loop-bound, not total-CPU-bound: 4-core and 8-core pinned SUTs hit the same cliff at the same step. - Updated TL;DR, Recommendation order (merge both #7775+#7776 first), and "Where to take this next" with worker-thread offload as the smallest next architectural step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#7775 + #7776 stacked — N=3 combined measurementI cherry-picked #7775 onto #7776 (branch
The stacked impact (-12% to -20% CPU%) is well above the sum of the individual gains. Both fixes remove call sites feeding the same log4js cluster-mode dispatch chain ( Latency: cliff has effectively moved past step 400. Raw p95 triples:
At step 400, two of three combined runs land below the cliff entirely. Recommend landing these two together to capture the full effect. |
Post-#7775/#7776 profile shows applyToAText splits cleanly: - applyToText (Changeset.ts:404) is pure (cs, text) -> text; trivially offloadable to a worker via worker_threads structured-clone postMessage. - applyToAttribution (Changeset.ts:684) mutates AttributePool; not trivially offloadable. Document the obvious first-pass design (run them in parallel via Promise.all inside applyToAText) and the realistic estimate (~6-8% CPU moved off the main event loop). putAttrib is only 0.26% in the post-fix profile, confirming the bulk of applyToAText's cost is in the string-manipulation half. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a "Roadmap for future effort" section ahead of Reproducing, ranking the next concrete options by impact-per-time-spent. Tier 1 (mechanical / <1 day each): - merge ready perf PRs (#7775+#7776+#7774) - implement #7780 room-broadcast fan-out - additional post-fix profile pass Tier 2 (medium, real cliff moves): - selective fan-out / viewport-based broadcast (~2 weeks; cliff ~500 → 1000-1500) - per-pad worker isolation PoC (~1-2 weeks PoC, 1-2 months prod) Tier 3 (large bets): - sticky-session cluster mode (~2-4 weeks PoC) - CRDT migration (months; anti-recommended) Tier 4 (operational): - production telemetry hookup (~3-5 days) - nightly dive in CI (~1 day) Records the recommended sequence (Tier 1.2 → Tier 2.4) so the next person picking this up doesn't need to re-derive it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
CPU profile of the SUT at the 100→400 author dive sweep (load-test workflow run 25956384097) attributed ~6% of total process CPU to the throw + catch around
getSessionInfo:new CustomError('sessionID does not exist', 'apierror')(stack-trace capture)console.debug(...)routed through log4js →sendToListeners→sendLogEventToAppenderBoth internal callers (
findAuthorIDon every CLIENT_READY,listSessionsWithDBKeyon session listing) caughtapierrorand discarded it. The publicexports.getSessionInfostill has to throw for the HTTP API (returnscode: 1for missing sessionID), so this PR introduces a privategetSessionInfoOrNullhelper that returnsnulland switches the two hot-path callers to it.exports.getSessionInfobecomes a thin wrapper that preserves the throw semantics.Profile evidence
Inverted callers of
CustomErrorconstructor in the profile:Inverted (non-log4js) callers of log4js
Logger.<computed>:(
checkAccessis the entry point that drivesfindAuthorID→getSessionInfo→console.debug.)Behavior
getSessionInfostill throws `apierror` when the session doesn't exist;RestAPI.ts/APIHandler.tstranslate that tocode: 1.findAuthorIDreturnsundefinedfor unknown sessions,listSessionsWithDBKeycontinues to log a warning and setsessions[sessionID] = null.Test plan
Part of #7756. Profile capture pipeline is in etherpad-load-test#109/110/111.