lexer: step past unknown byte on TSyntaxError in next_inside_jsx_element#30969
Conversation
Parsing `<r L=((` in a debug build panicked with "Scope location must
be greater than previous" in push_scope_for_parse_pass.
In the JSX prop-value recovery path, next_inside_jsx_element returns
TSyntaxError on the first '(' without stepping past it. The subsequent
expect(TOpenBrace) logs the error and calls next(), which then
mis-dispatches on the still-in-code_point '(' and produces a
zero-length TOpenParen at the same source offset. parse_paren_expr
then pushes two FunctionArgs scopes at that offset, tripping the
strict-monotonicity debug assertion.
Release builds already reported both parse errors and recovered
cleanly; the debug check was stricter than release. Skip the
scope-location monotonicity assertion once an error has been logged,
so debug matches release on error-recovery paths.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughLexer now advances past an invalid byte after emitting a JSX syntax error to avoid reprocessing the same input. Tests updated: one snapshot extended and one new concurrent regression test added for issue ChangesJSX Lexer Error Recovery Fix
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
|
Updated 1:42 AM PT - May 18th, 2026
✅ @robobun, your commit e49af9ce3f0d07044d959d929fec5ba80fd67fee passed in 🧪 To try this PR locally: bunx bun-pr 30969That installs a local version of the PR into your bun-30969 --bun |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/regression/issue/jsx-template-string-crash.test.ts`:
- Around line 32-67: This regression test named 'test("`#30959` JSX attribute with
invalid '(' value parses cleanly in debug builds"...' belongs in the
issue-numbered regression tests; move the test block out of
jsx-template-string-crash.test.ts and place it into a new issue-numbered test
file named 30959.test.ts (or split accordingly) following the
test/regression/issue/* convention, keeping the test body, assertions
(normalizeBunSnapshot, expect stdout/exitCode) and Bun.spawn invocation
unchanged so CI recognizes it as a targeted `#30959` regression test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 96219220-44d8-43d6-b3fb-525157d5a4f4
📒 Files selected for processing (2)
src/js_parser/p.rstest/regression/issue/jsx-template-string-crash.test.ts
Review feedback (#30969): - `has_errors()` checks the absolute error count on the shared `Log`, which may already carry errors from earlier files in the same parse session. Capture the baseline at `P::init` (`orig_error_count`) and compare `log.errors > self.orig_error_count` instead — mirrors the `orig_error_count` pattern at `parse_entry.rs:762`/`:876` that already gates the visit-pass bailout. Without this, an unrelated error from an earlier file would silently disable the debug-only scope-location invariant for every subsequent file in the session. - Mark the new #30959 regression test as `test.concurrent` per `test/CLAUDE.md` — it spawns an async subprocess and shares no state with the pre-existing test in the file.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/regression/issue/jsx-template-string-crash.test.ts`:
- Line 54: The expectation calls .toString() on stderr which is already a string
(from proc.stderr.text()); remove the redundant .toString() call in the expect
line that invokes normalizeBunSnapshot so it operates directly on stderr (e.g.,
change the argument to normalizeBunSnapshot to use stderr.replace(...)) while
keeping the replace regex and toMatchInlineSnapshot intact; update the test line
around the call to normalizeBunSnapshot/expect to use stderr directly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: aa55a4ee-1152-45d1-95ef-6d8925e5912d
📒 Files selected for processing (2)
src/js_parser/p.rstest/regression/issue/jsx-template-string-crash.test.ts
There was a problem hiding this comment.
LGTM — both nits from the previous pass are addressed (delta comparison against orig_error_count, test.concurrent), and the change is debug-only.
Extended reasoning...
Overview
Relaxes the strictly-increasing scope-location debug assertion in push_scope_for_parse_pass (src/js_parser/p.rs) so it no longer fires once the current file has logged a parse error. Adds a u32 orig_error_count field on P, initialized at P::init from the shared Log, and gates the assertion on log.errors > orig_error_count. A regression test in test/regression/issue/jsx-template-string-crash.test.ts asserts the release-mode error output for the <r L=(( repro.
Security risks
None. The entire modified block is inside if cfg!(debug_assertions); release builds are byte-for-byte unaffected. No input handling, auth, or data-exposure surface is touched.
Level of scrutiny
Low. The change only relaxes a debug assertion, and only on the error-recovery path — the worst possible regression is marginally reduced first-pass assertion coverage after an error is already logged, and any real scope mismatch would still be caught by push_scope_for_visit_pass. The new orig_error_count field is a plain u32 read once at construction; the unsafe { log.as_ref() } mirrors the existing P::log() accessor pattern in the same file and the identical read at parse_entry.rs:762.
Other factors
My two prior nits (compare against a per-parse baseline rather than has_errors(); mark the new test test.concurrent) were both applied in bc00e5d and are visible in the final diff. CodeRabbit's suggestion to move the test to an issue-numbered file was declined with a reasonable rationale (fuzzer-found, never-worked bug ≠ regression per test/CLAUDE.md) and resolved. The fix mirrors the established orig_error_count delta pattern used at the visit-pass bailout (parse_entry.rs:876, parse_entry.zig:439), and the regression test pins the expected error output.
Jarred-Sumner
left a comment
There was a problem hiding this comment.
I don't think this fix is correct. Diff the control flow compared to the Zig version and figure out why it's broken here. My guess is its related to restoring lexer state when parsing typescript fails.
Jarred pointed out (#30969) that the previous fix — skipping the debug assertion once the file logs an error — was hiding a real bug rather than fixing it. The root cause is in next_inside_jsx_element's TSyntaxError branch. On an unknown character it sets `end = current` but leaves `code_point` holding that same byte, so `current` never advances past it. When the caller recovers via `expect().next()`, the main lexer's next() re-dispatches on that still-in-code_point byte, using the updated `end` as the new `start`. step_with() then reads the *following* byte (the second '(' in `<r L=((`) and assigns it to end/code_point, producing a zero-length TOpenParen at `start=end=6`. pfx_t_open_paren captures `loc=6`, calls next() once more which advances end to 7 and re-tokenises the same '(' as a real TOpenParen at `start=6` again, and parse_paren_expr ends up pushing two FunctionArgs scopes at offset 6 — tripping the strict-monotonicity debug assertion in push_scope_for_parse_pass. Fix: call step() immediately after setting TSyntaxError so code_point/current advance past the bad byte. The error range `[start, end)` is unchanged (end was already set a line above), so the "Expected { but found (" diagnostic still points at the right source position. Subsequent `next()` now starts from the *next* byte and emits one correct token instead of two at the same offset. Release behaviour is identical on valid input (the TSyntaxError branch is only taken when the JSX lexer sees a character it can't tokenise). On the fuzzer-found input `<r L=((` debug now matches release and emits `Expected "{" but found "("` followed by `Unexpected end of file`. The pre-existing `<div a=\`\`/>` regression snapshot in jsx-template-string-crash.test.ts shifts from `Unexpected >` at col 37 to `Unterminated string literal` at col 35 — a more accurate diagnostic (the second backtick is what the recovery lexer treats as a template opener, which never terminates) rather than a behavioural regression. Reverts the `orig_error_count` workaround on P.
|
Thanks for the nudge — you were right, the assertion workaround was masking the real bug. Diffed the control flow step-by-step against the Zig version and the divergence isn't in snapshot/restore around self.end = self.current;
self.token = T::TSyntaxError;After this the lexer reports
d1a40fc fixes it at the right layer: call Reverted the The pre-existing Still passes the local transpiler / jsx-bundler / jsx-template suites. Test still fails with a 5s timeout on |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
test/regression/issue/jsx-template-string-crash.test.ts (1)
59-59: 🧹 Nitpick | 🔵 Trivial | 💤 Low valueRemove redundant
.toString()call.
stderris already a string fromproc.stderr.text()(line 57), so calling.toString()is unnecessary.♻️ Proposed fix
- expect(normalizeBunSnapshot(stderr.toString().replace(/(Bun v.*)$/gm, ""))).toMatchInlineSnapshot(` + expect(normalizeBunSnapshot(stderr.replace(/(Bun v.*)$/gm, ""))).toMatchInlineSnapshot(`🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/regression/issue/jsx-template-string-crash.test.ts` at line 59, The test calls .toString() on stderr unnecessarily—stderr is already a string (from proc.stderr.text()); update the assertion in the expect call that uses normalizeBunSnapshot so it calls replace directly on stderr (e.g., change normalizeBunSnapshot(stderr.toString().replace(...)) to normalizeBunSnapshot(stderr.replace(...))) while keeping the same regex and the normalizeBunSnapshot invocation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@test/regression/issue/jsx-template-string-crash.test.ts`:
- Line 59: The test calls .toString() on stderr unnecessarily—stderr is already
a string (from proc.stderr.text()); update the assertion in the expect call that
uses normalizeBunSnapshot so it calls replace directly on stderr (e.g., change
normalizeBunSnapshot(stderr.toString().replace(...)) to
normalizeBunSnapshot(stderr.replace(...))) while keeping the same regex and the
normalizeBunSnapshot invocation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3f8e0149-2119-45df-8baa-810e7e6fdfff
📒 Files selected for processing (1)
test/regression/issue/jsx-template-string-crash.test.ts
stderr is already a string from proc.stderr.text(). Per coderabbit.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/regression/issue/jsx-template-string-crash.test.ts (1)
17-30: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winMove
exitCodeassertion to the end of the first test.
expect(exitCode).toBe(1)is currently asserted before output checks, which weakens failure diagnostics compared to the house style used elsewhere in this file.♻️ Proposed fix
- expect(exitCode).toBe(1); expect(normalizeBunSnapshot(stderr.toString().replace(/(Bun v.*)$/gm, ""))).toMatchInlineSnapshot(` @@ `); expect(normalizeBunSnapshot(stdout.toString())).toMatchInlineSnapshot(`""`); + expect(exitCode).toBe(1);As per coding guidelines: “When spawning processes in tests, expect stdout before expecting exit code for more useful error messages” and “Assert the exit code last in tests - this gives a more useful error message on test failure”.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/regression/issue/jsx-template-string-crash.test.ts` around lines 17 - 30, The exitCode assertion is placed before the output assertions; move the line "expect(exitCode).toBe(1)" to the end of the first test so stdout/stderr checks run first — specifically, after the two expect(normalizeBunSnapshot(...stderr...)).toMatchInlineSnapshot(...) and expect(normalizeBunSnapshot(stdout.toString())).toMatchInlineSnapshot(`""`) assertions; update the test containing those calls so exitCode is asserted last to follow the file's house style.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@test/regression/issue/jsx-template-string-crash.test.ts`:
- Around line 17-30: The exitCode assertion is placed before the output
assertions; move the line "expect(exitCode).toBe(1)" to the end of the first
test so stdout/stderr checks run first — specifically, after the two
expect(normalizeBunSnapshot(...stderr...)).toMatchInlineSnapshot(...) and
expect(normalizeBunSnapshot(stdout.toString())).toMatchInlineSnapshot(`""`)
assertions; update the test containing those calls so exitCode is asserted last
to follow the file's house style.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 53e6e9fd-fbb7-4f18-82bd-815243fe30fa
📒 Files selected for processing (1)
test/regression/issue/jsx-template-string-crash.test.ts
There was a problem hiding this comment.
The one-line step() fix is well-reasoned and all prior feedback is addressed, but since this changes lexer error-recovery behavior (visible in the shifted <div a=\`/>` snapshot you flagged as a judgment call) and Build #55696 reported failures before the retrigger, I'd like a human to sign off.
Extended reasoning...
Overview
This PR adds a single self.step() call (plus an explanatory comment) to the TSyntaxError fallthrough arm of next_inside_jsx_element in src/js_parser/lexer.rs, fixing a debug-only panic where the lexer failed to advance past an invalid byte and the parser subsequently pushed two scopes at the same offset. The test file gains a new #30959 regression case and updates the existing <div a=\`/>` snapshot, whose diagnostic shifted from "Unexpected >" at col 37 to "Unterminated string literal" at col 35.
Security risks
None. This is an error-recovery path in the JSX lexer that only fires on already-malformed input; the change advances a cursor by one codepoint. No auth, crypto, permissions, or untrusted-data handling is involved.
Level of scrutiny
Moderate-to-high. The Rust JS lexer is a performance-critical, correctness-critical state machine that every parsed file flows through. Although the change is one line on an error path and release builds were already unaffected, it produces a user-visible behavioral shift in error diagnostics for a different malformed input (the pre-existing backtick test). The author explicitly flagged this as a judgment call ("if you'd prefer to preserve the old error I can scope the step to only fire when the caller is inside JSX-attribute recovery") — that's exactly the kind of design tradeoff a human maintainer should weigh in on rather than a bot.
Other factors
- All three of my prior inline nits were addressed (orig_error_count → reverted in favor of the proper lexer fix; test.concurrent; stale comment wording). My remaining 05:47 comment about the parallel pattern at lexer.rs:2234 is explicitly pre-existing/non-blocking.
- The PR went through a significant approach change mid-review (assertion-relaxation → root-cause lexer fix), which is the right outcome but means the final shape hasn't had human eyes on it.
- robobun reported failures in Build #55696 against b3e0a97; the most recent commit (0aad3ff) is a CI retrigger whose result I can't see. I'd want green CI confirmed before this lands.
- No CODEOWNERS entry covers
src/js_parser/. - I verified
step()→next_codepoint_with()setsend = currentbefore advancing, so the new call leaves the error range[start, end)unchanged as the comment claims, including at EOF and for multibyte codepoints.
|
CI status: diff is green on all lanes that exercise the parser (debian-13 x64/aarch64/asan/baseline, ubuntu-25-aarch64, all musl test lanes, all windows/darwin build lanes, the verify-baseline x64-musl + windows variants). The two still-red lanes are unrelated to the JSX lexer:
I used my one re-roll on the Rust-CDN failure; per the farm policy I should not push another |
Mirrors d1a40fc at the parallel site in the main `next()` fallthrough (lexer.rs:2234-2235). Per Jarred on PR #30969 — the same pattern lived at both sites and the Rust port diverged from the Zig sibling at one of them, leaving the main lexer with `current == end` after the error instead of the `current > end` invariant every other branch maintains. The concrete duplicate-scope panic the JSX site triggered isn't currently reachable from the main lexer (a byte that falls through to this arm is invalid in main-lexer context too, so recovery stays in `TSyntaxError` rather than manufacturing a real token), but keeping the invariant consistent across both dispatch tables means future recovery paths don't have to reason about one arm that leaves the lexer half-advanced.
|
CI status update: all parser-exercising lanes green (72/73 pass). The one red lane on build #55717 is I already spent my re-roll on the previous Rust-CDN hiccup + darwin-14-x64 timeout; per the farm policy I'm not pushing another |
…ent (#30969) Fixes #30959. ## Repro ```console $ echo '<r L=((' > /tmp/bug.tsx $ bun-debug build /tmp/bug.tsx panic: Scope location must be greater than previous 6 must be greater than 6 (src/bun_core/output.rs:1053:9) ``` Release builds print the two parse errors and recover cleanly; only debug builds trip the assertion. ## Root cause In `next_inside_jsx_element` the `TSyntaxError` branch sets `end = current` but never advances `current` / `code_point` past the bad byte — every *other* branch in that switch calls `step()` which leaves `current = end + 1`, the invariant the rest of `next()` depends on. The `TSyntaxError` branch violates it. For `<r L=((` the lexer reports `start=5, end=6, current=6, code_point=0x28` (= the first `(`) after tokenising it as `TSyntaxError`. The subsequent `p.lexer.expect(T::TOpenBrace)` in `parse_jsx_prop_value_identifier`: 1. logs `Expected "{" but found "("` via `add_range_error` (returns `Ok`), 2. calls `self.next()`, which re-dispatches on the un-stepped `code_point=0x28`, matches `(` → `TOpenParen`, and `step_with` reads `contents[current=6]` (= the **second** `(`), updates `end = 6`, `current = 7`, `code_point = 0x28`. Final state: `start=6 end=6 current=7 tok=TOpenParen` — a **zero-length** `TOpenParen` at offset 6. `pfx_t_open_paren` then captures `loc=6`, calls `next()` one more time which advances `end` to 7 and re-tokenises the same `(` as a real `TOpenParen` also at `start=6`. `parse_paren_expr` pushes two `FunctionArgs` scopes at offset 6 and the debug assertion in `push_scope_for_parse_pass` fires. ## Fix Call `self.step()` immediately after setting `TSyntaxError` in `next_inside_jsx_element`. `end` was already advanced on the line above, so the error range `[start, end)` is unchanged and the `Expected "{" but found "("` diagnostic still points at the right source position. Subsequent `next()` dispatches on the *next* byte and emits one correct token, so the parser never sees a duplicate scope offset and the `push_scope_for_parse_pass` assertion keeps its original strict form. ## Verification `test/regression/issue/jsx-template-string-crash.test.ts` — new `#30959` case asserts that the release-mode error output (`Expected "{" but found "("` + `Unexpected }`) is produced in debug mode too. Fails with a 5s timeout on main (the panic hangs the process); passes with the fix. The existing `JSX lexer should not crash with slice bounds issues` regression snapshot in the same file shifts from `Unexpected >` at col 37 to `Unterminated string literal` at col 35 for `<div a=``/>`. That shift is behavioural: the recovery lexer no longer re-dispatches on the first backtick, so the second backtick is now what gets tokenised as the template opener (which then runs to EOF — "unterminated"). The new diagnostic points at the actual unterminated string rather than the later `>`; both are valid parse errors for the same malformed input. Transpiler / bundler JSX suites (`test/bundler/transpiler/transpiler.test.js`, `test/bundler/bundler_jsx.test.ts`) still pass. --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Fixes #30959.
Repro
Release builds print the two parse errors and recover cleanly; only debug builds trip the assertion.
Root cause
In
next_inside_jsx_elementtheTSyntaxErrorbranch setsend = currentbut never advancescurrent/code_pointpast the bad byte — every other branch in that switch callsstep()which leavescurrent = end + 1, the invariant the rest ofnext()depends on. TheTSyntaxErrorbranch violates it.For
<r L=((the lexer reportsstart=5, end=6, current=6, code_point=0x28(= the first() after tokenising it asTSyntaxError. The subsequentp.lexer.expect(T::TOpenBrace)inparse_jsx_prop_value_identifier:Expected "{" but found "("viaadd_range_error(returnsOk),self.next(), which re-dispatches on the un-steppedcode_point=0x28, matches(→TOpenParen, andstep_withreadscontents[current=6](= the second(), updatesend = 6,current = 7,code_point = 0x28. Final state:start=6 end=6 current=7 tok=TOpenParen— a zero-lengthTOpenParenat offset 6.pfx_t_open_parenthen capturesloc=6, callsnext()one more time which advancesendto 7 and re-tokenises the same(as a realTOpenParenalso atstart=6.parse_paren_exprpushes twoFunctionArgsscopes at offset 6 and the debug assertion inpush_scope_for_parse_passfires.Fix
Call
self.step()immediately after settingTSyntaxErrorinnext_inside_jsx_element.endwas already advanced on the line above, so the error range[start, end)is unchanged and theExpected "{" but found "("diagnostic still points at the right source position. Subsequentnext()dispatches on the next byte and emits one correct token, so the parser never sees a duplicate scope offset and thepush_scope_for_parse_passassertion keeps its original strict form.Verification
test/regression/issue/jsx-template-string-crash.test.ts— new#30959case asserts that the release-mode error output (Expected "{" but found "("+Unexpected }) is produced in debug mode too. Fails with a 5s timeout on main (the panic hangs the process); passes with the fix.The existing
JSX lexer should not crash with slice bounds issuesregression snapshot in the same file shifts fromUnexpected >at col 37 toUnterminated string literalat col 35 for<div a=``/>. That shift is behavioural: the recovery lexer no longer re-dispatches on the first backtick, so the second backtick is now what gets tokenised as the template opener (which then runs to EOF — "unterminated"). The new diagnostic points at the actual unterminated string rather than the later>; both are valid parse errors for the same malformed input.Transpiler / bundler JSX suites (
test/bundler/transpiler/transpiler.test.js,test/bundler/bundler_jsx.test.ts) still pass.