Skip to content

perf: avoid throw-as-control-flow in SessionManager hot path (#7756)#7775

Draft
JohnMcLear wants to merge 1 commit into
developfrom
perf/session-getinfo-no-throw
Draft

perf: avoid throw-as-control-flow in SessionManager hot path (#7756)#7775
JohnMcLear wants to merge 1 commit into
developfrom
perf/session-getinfo-no-throw

Conversation

@JohnMcLear
Copy link
Copy Markdown
Member

Summary

CPU profile of the SUT at the 100→400 author dive sweep (load-test workflow run 25956384097) attributed ~6% of total process CPU to the throw + catch around getSessionInfo:

  • ~1.82% to new CustomError('sessionID does not exist', 'apierror') (stack-trace capture)
  • ~4.12% downstream, via the catch block's console.debug(...) routed through log4js → sendToListenerssendLogEventToAppender

Both internal callers (findAuthorID on every CLIENT_READY, listSessionsWithDBKey on session listing) caught apierror and discarded it. The public exports.getSessionInfo still has to throw for the HTTP API (returns code: 1 for missing sessionID), so this PR introduces a private getSessionInfoOrNull helper that returns null and switches the two hot-path callers to it. exports.getSessionInfo becomes a thin wrapper that preserves the throw semantics.

Profile evidence

Inverted callers of CustomError constructor in the profile:

 1.82% exports.getSessionInfo [src/node/db/SessionManager.ts]

Inverted (non-log4js) callers of log4js Logger.<computed>:

 4.12% exports.checkAccess [src/node/db/SecurityManager.ts]

(checkAccess is the entry point that drives findAuthorIDgetSessionInfoconsole.debug.)

Behavior

  • No behavior change for the HTTP API. getSessionInfo still throws `apierror` when the session doesn't exist; RestAPI.ts/APIHandler.ts translate that to code: 1.
  • The two internal callers preserve their existing semantics: findAuthorID returns undefined for unknown sessions, listSessionsWithDBKey continues to log a warning and set sessions[sessionID] = null.

Test plan

  • `pnpm exec mocha tests/backend/specs/api/sessionsAndGroups.ts` — all 32 cases pass, including "getSessionInfo of deleted session" (still expects code:1).
  • Re-run the dive sweep with this branch as core_ref and confirm steady-state CPU% drops at the cliff (300-400 authors).

Part of #7756. Profile capture pipeline is in etherpad-load-test#109/110/111.

CPU profile of the SUT at the 100-400 author dive sweep
(load-test workflow run 25956384097) attributed about 6% of total
process CPU to the throw + catch around getSessionInfo:

  - ~1.82% to `new CustomError('sessionID does not exist', 'apierror')`
    construction (stack trace capture)
  - ~4.12% downstream, via the catch block's `console.debug(...)`
    routed through log4js -> sendToListeners -> sendLogEventToAppender

Both call sites (`findAuthorID` on every CLIENT_READY, and
`listSessionsWithDBKey` on session listing) immediately caught
`apierror` and discarded it. The public `exports.getSessionInfo`
contract still has to throw for the HTTP API (returning code:1 for
missing sessionID), so introduce a private `getSessionInfoOrNull`
helper that returns null and have the hot-path callers use it
directly. `exports.getSessionInfo` is kept as a thin wrapper that
preserves the existing throw semantics.

No behaviour change for the HTTP API — sessionsAndGroups.ts test
file (32 cases, including "getSessionInfo of deleted session"
expecting code:1) passes unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

Review Summary by Qodo

Avoid throw-as-control-flow in SessionManager hot path

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Eliminate throw-as-control-flow in hot-path callers of getSessionInfo
• Introduce private getSessionInfoOrNull helper returning null
• Refactor findAuthorID and listSessionsWithDBKey to use null checks
• Reduce CPU overhead by ~6% (1.8% CustomError construction + 4% logging)
Diagram
flowchart LR
  A["getSessionInfoOrNull<br/>returns null"] --> B["findAuthorID<br/>null check"]
  A --> C["listSessionsWithDBKey<br/>null check"]
  D["exports.getSessionInfo<br/>wrapper"] --> E["throws apierror<br/>for HTTP API"]
  A --> D
Loading

Grey Divider

File Changes

1. src/node/db/SessionManager.ts Performance optimization +20/-22

Replace throw-catch with null-check pattern

• Introduce private getSessionInfoOrNull helper that returns null instead of throwing
• Refactor findAuthorID to use null-returning helper with simple || check
• Refactor listSessionsWithDBKey to use null-returning helper with explicit null check
• Keep exports.getSessionInfo as thin wrapper preserving throw semantics for HTTP API

src/node/db/SessionManager.ts


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented May 16, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider

Great, no issues found!

Qodo reviewed your code and found no material issues that require review

Grey Divider

Qodo Logo

JohnMcLear added a commit that referenced this pull request May 16, 2026
CPU profile of develop at the 100-400 author dive sweep (load-test
run 25956384097) identified a ~6% process-CPU win in SessionManager:
throw-as-control-flow on every CLIENT_READY session lookup.

Add lever 9 section with the profile evidence, link the open PR
(#7775), and add a "Other CPU hotspots surfaced" subsection
documenting findings not yet acted on (Changeset internals,
appendRevision, ueberdb/dirty backing as test-harness artifact,
esbuild __name overhead). Update Recommendation to include #7775
as the highest-priority merge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JohnMcLear added a commit that referenced this pull request May 16, 2026
Replace the "score pending" placeholder under lever 9 with the
actual numbers from runs 25957107195/25957108328/25957109418
(perf branch) vs 25954537767/25954538807/25954540108 (develop),
both at authors=100..500:step=50:dwell=8s:warmup=2s.

Result: consistent -1.4% to -5.3% CPU reduction across all 9 steps,
matching profile direction at 2-5% (vs 6% profile-attributed upper
bound). Latency delta sits inside the noise envelope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JohnMcLear
Copy link
Copy Markdown
Member Author

N=3 measured impact

Three perf-branch runs (25957107195, 25957108328, 25957109418) compared against three develop baselines (25954537767, 25954538807, 25954540108) — same authors=100..500:step=50:dwell=8s:warmup=2s sweep on each.

Medians:

step dev CPU% perf CPU% ΔCPU%
100 4.76 4.67 -1.7%
150 9.09 8.95 -1.5%
200 15.21 14.60 -4.0%
250 21.51 21.32 -0.9%
300 30.46 29.68 -2.6%
350 41.58 39.36 -5.3%
400 56.26 54.23 -3.6%
450 72.33 70.49 -2.5%
500 88.38 87.14 -1.4%

ΔCPU% is consistently negative (-1.4% to -5.3%) across all 9 steps. The realised magnitude is below the profile-attributed 6% upper bound because some of the log4js cost the profile attributed to the throw path was unrelated info logging — but the direction is exactly what the profile predicted.

Latency delta sits inside the noise envelope across all steps (raw p95 triples overlap heavily run-to-run).

JohnMcLear added a commit that referenced this pull request May 16, 2026
Three combined-branch runs (perf/dive-combined = #7776 cherry-picked
onto #7775 base; runs 25960003164/25960004223/25960005248) vs the
same three develop baselines: -12% to -20% CPU% across all 9 sweep
steps, with the p95 cliff effectively moving from ~400 to ~500
authors (at step 400, two of three combined runs land below the
cliff at 45ms and 112ms p95 vs develop [1758, 2275, 2463]).

Adds:
- Lever 10 section for #7776 with its own N=3 numbers (-3.6 to -8.9%
  alone).
- "Stacking" section showing super-additive interaction.
- Local vCPU experiment showing the cliff is single-event-loop-bound,
  not total-CPU-bound: 4-core and 8-core pinned SUTs hit the same
  cliff at the same step.
- Updated TL;DR, Recommendation order (merge both #7775+#7776 first),
  and "Where to take this next" with worker-thread offload as the
  smallest next architectural step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JohnMcLear
Copy link
Copy Markdown
Member Author

#7775 + #7776 stacked — N=3 combined measurement

I cherry-picked #7775 onto #7776 (branch perf/dive-combined) and ran 3 cliff sweeps (25960003164, 25960004223, 25960005248) against the same 3 develop baselines used for the individual PR scoring. The two fixes stack super-additively:

step dev CPU% #7775 alone #7776 alone both Δboth
100 4.76 4.67 4.51 3.99 -16.1%
200 15.21 14.60 14.33 12.48 -17.9%
300 30.46 29.68 28.50 24.39 -19.9%
350 41.58 39.36 37.87 33.04 -20.5%
400 56.26 54.23 53.67 44.78 -20.4%
450 72.33 70.49 68.80 61.18 -15.4%
500 88.38 87.14 85.17 77.70 -12.1%

The stacked impact (-12% to -20% CPU%) is well above the sum of the individual gains. Both fixes remove call sites feeding the same log4js cluster-mode dispatch chain (sendToListeners → sendLogEventToAppender); halving the LogEvent allocation rate appears to relieve queue/GC pressure beyond what either fix accounts for in isolation.

Latency: cliff has effectively moved past step 400. Raw p95 triples:

step develop p95 [3 runs] combined p95 [3 runs]
400 [1758, 2275, 2463] [45, 112, 634]
450 [5415, 6167, 6611] [3297, 3719, 3897] (-40%)
500 [10655, 11759, 12183] [8091, 8711, 9127] (-26%)

At step 400, two of three combined runs land below the cliff entirely. Recommend landing these two together to capture the full effect.

JohnMcLear added a commit that referenced this pull request May 16, 2026
Post-#7775/#7776 profile shows applyToAText splits cleanly:
- applyToText (Changeset.ts:404) is pure (cs, text) -> text; trivially
  offloadable to a worker via worker_threads structured-clone postMessage.
- applyToAttribution (Changeset.ts:684) mutates AttributePool; not
  trivially offloadable.

Document the obvious first-pass design (run them in parallel via
Promise.all inside applyToAText) and the realistic estimate (~6-8%
CPU moved off the main event loop). putAttrib is only 0.26% in the
post-fix profile, confirming the bulk of applyToAText's cost is in
the string-manipulation half.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JohnMcLear added a commit that referenced this pull request May 16, 2026
Add a "Roadmap for future effort" section ahead of Reproducing,
ranking the next concrete options by impact-per-time-spent.

Tier 1 (mechanical / <1 day each):
- merge ready perf PRs (#7775+#7776+#7774)
- implement #7780 room-broadcast fan-out
- additional post-fix profile pass

Tier 2 (medium, real cliff moves):
- selective fan-out / viewport-based broadcast (~2 weeks; cliff ~500 → 1000-1500)
- per-pad worker isolation PoC (~1-2 weeks PoC, 1-2 months prod)

Tier 3 (large bets):
- sticky-session cluster mode (~2-4 weeks PoC)
- CRDT migration (months; anti-recommended)

Tier 4 (operational):
- production telemetry hookup (~3-5 days)
- nightly dive in CI (~1 day)

Records the recommended sequence (Tier 1.2 → Tier 2.4) so the
next person picking this up doesn't need to re-derive it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JohnMcLear JohnMcLear marked this pull request as draft May 18, 2026 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant