Skip to content

fix(schema): make codegen idempotent so drift detection surfaces only real Riot drift#21

Merged
wardbox merged 2 commits into
mainfrom
fix/codegen-idempotent-regen
Apr 27, 2026
Merged

fix(schema): make codegen idempotent so drift detection surfaces only real Riot drift#21
wardbox merged 2 commits into
mainfrom
fix/codegen-idempotent-regen

Conversation

@wardbox
Copy link
Copy Markdown
Owner

@wardbox wardbox commented Apr 27, 2026

Why

Per yesterday's local test of `pnpm generate-schema`, the schema-drift workflow is currently unusable:

  • 407 lines of JSDoc wiped on every regen (84% of the diff PR chore: update schemas for Riot API changes #18 produced)
  • Invalid TypeScript emitted: `ar-AE: string` for hyphenated locale codes
  • Item shapes dropped whenever a captured array sample happens to be empty (e.g., `lol-status-v4.maintenances`)

These were three distinct codegen bugs, each previously hand-patched in main without fixing the codegen — meaning every regen reproduced the bugs and produced a 600+ line destructive diff. The actual Riot drift was buried.

Three commits in main illustrate the pattern:

  • `5eeab6f` "feat: add JSDoc to generated types" — hand-added 100+ JSDoc blocks
  • `e9803e8` "fix: quote invalid TS property names" — hand-quoted hyphenated keys
  • (no commit yet for the empty-array issue — the workflow has never produced a clean run)

What

Three targeted codegen fixes plus the supporting data:

  1. JSDoc emission from a structured source

    • `codegen.ts` now reads `scripts/descriptions.json` and emits `/** ... */` blocks for interfaces and fields
    • JSON is keyed by `${game}.${interfaceName}` so identical names across games (e.g. `BannedChampion` in LoL and TFT) don't collide
    • `scripts/extract-descriptions.mjs` is the one-shot tool that seeded the corpus from existing hand-edits — kept for re-extraction
    • Result: 66 interfaces (339 fields) preserved exactly, regenerable
  2. Quote non-identifier property keys (`formatPropertyKey()`)

    • Identifiers (`[A-Za-z_$][A-Za-z0-9_$]*`) and integer literals (`[0-9]+`) unquoted
    • Everything else (hyphens, dots, etc.) double-quoted to match committed convention
    • Result: `"ar-AE": string` instead of invalid `ar-AE: string`
  3. Preserve array item shape on empty samples (`patchUnknownArrayItems`)

    • Walks new schema and prior committed schema in parallel
    • When new sample is empty (`items: { type: 'unknown' }`) but prior has a concrete shape, restore it
    • Recurses through nested objects/arrays
    • Doesn't mask top-level field additions/removals — only restores nested item info we couldn't sample this run

Validation

Local idempotency check: regenerated `.ts` files against unchanged schemas — diff is now scoped to inconsistencies in main itself (single-vs-double-quote drift between hand-edited interfaces, `{x}[]` vs `({x})[]` style drift, one extra blank line in some files), not codegen bugs. The committed `generated/*.ts` files in this PR are the codegen's actual output, so future regens are no-ops modulo real Riot drift.

Comparison vs. previous behavior on a clean regen against committed schemas:

Known limitation

Hand-edited inline JSDoc inside anonymous object types — e.g. `({ /** Title */ content: string })` — is not preserved. Those anonymous nested types have no stable name to key descriptions against in the JSON corpus, and adding one would require a more elaborate descriptions schema. Top-level interface descriptions and field-level descriptions are fully preserved.

Tests

  • 32 new unit tests across `codegen.test.ts` and `schema.test.ts`:
    • `formatPropertyKey`: identifiers, integers, hyphens, leading digits, escape handling
    • `formatJsDoc`: single-line vs multi-line, indentation
    • `generateInterface`: with/without descriptions, non-identifier keys
    • `patchUnknownArrayItems`: replace, preserve, nested objects, nested arrays, missing source
  • All 506 tests pass
  • Build clean (`pnpm build` produces same dist sizes ±)

Next steps after merge

  • Schema-drift workflow can finally surface real Riot drift (e.g., the `soloTurretsLategame` add and `teleportTakedowns` removal we found in match-v5 yesterday)
  • Enable TFT/VAL/LoR product access on the dev API key for full coverage
  • Future hand-edits to `generated/*.ts` will still be wiped on regen — descriptions belong in `scripts/descriptions.json` now

🤖 Generated with Claude Code

Fixes nondeterministic codegen to enable idempotent schema generation

Addresses codegen nondeterminism so future regenerations only reflect real Riot API drift. Implements three targeted fixes and supporting data.

JSDoc emission

  • Adds scripts/descriptions.json: catalog of human-readable type and field descriptions (66 interfaces, 339 fields seeded).
  • Adds scripts/extract-descriptions.mjs to extract JSDoc from existing generated TS and produce the catalog.
  • Updates codegen to load descriptions and emit JSDoc for interfaces and documented fields (inline JSDoc inside anonymous nested object types remains a known limitation).

Property key formatting

  • Adds formatPropertyKey() to leave identifiers and integer literals unquoted and double-quote/escape all other keys (hyphens, dots, etc.), preventing invalid TypeScript like ar-AE: string.

Array item shape preservation

  • Adds patchUnknownArrayItems(target, source) to restore concrete array item shapes from prior schemas when new samples yield empty/unknown items; recurses into nested arrays/objects and does not mask top-level additions/removals.

Validation

  • 32 new unit tests across codegen.test.ts and schema.test.ts; all 506 tests pass.
  • Local idempotency checks: previous regen produced large deletions (~614 lines); with these fixes regen yields only small internal diffs and subsequent regens become no-ops absent real schema drift.
  • Build is clean.

Key files changed

  • New: scripts/descriptions.json
  • New: scripts/extract-descriptions.mjs
  • Modified: scripts/generate-schema/codegen.ts (load/descriptions, formatJsDoc, formatPropertyKey API additions)
  • Modified: scripts/generate-schema/index.ts (apply patching and pass descriptions path)
  • New tests: scripts/generate-schema/codegen.test.ts, scripts/generate-schema/schema.test.ts
  • Modified: scripts/generate-schema/schema.ts (export patchUnknownArrayItems)

… real Riot drift

The schema-drift workflow was unusable: every regen wiped 407 lines of
JSDoc, broke property names with hyphens (`ar-AE: string` is invalid TS),
and dropped item shapes whenever an array sample was empty. Three hand-
patched commits in main papered over codegen output instead of fixing
the codegen, so any rerun produced 600+ line destructive diffs.

This PR fixes the codegen so its output is reproducible:

1. **JSDoc emission**. `codegen.ts` now reads `scripts/descriptions.json`
   and emits `/** ... */` blocks for interfaces and fields. The JSON is
   keyed by `${game}.${interfaceName}` so identical names across games
   (e.g. `BannedChampion` in both LoL and TFT) don't collide.
   `scripts/extract-descriptions.mjs` is the one-shot tool that seeded
   the corpus from the existing hand-edits — kept around for re-extraction.

2. **Quote non-identifier property keys**. `formatPropertyKey()` accepts
   identifiers and integer literals unquoted, double-quotes everything
   else. Eliminates the `ar-AE: string` invalid-TS bug. Match the double-
   quote convention already in committed types.

3. **Preserve array item shape on empty samples**. `patchUnknownArrayItems`
   walks the new schema and the previously-committed schema in parallel;
   when the new sample produced `items: { type: 'unknown' }` (empty array)
   but the prior schema captured a concrete shape, retain the prior shape.
   Doesn't mask top-level field additions/removals — only restores nested
   item info that this run couldn't sample.

The committed `generated/*.ts` files are the codegen's actual output
against the existing schemas, so future regens are now no-ops modulo
real Riot drift. Validated locally: regen against unchanged schemas
produces zero unexpected diffs.

Known limitation: hand-edited inline JSDoc *inside* anonymous object
types (e.g. `({ /** Title */ content: string })`) is not preserved —
those nested anonymous types have no stable name to key descriptions
against. Top-level interface and field descriptions are fully preserved.

Tests: 32 new tests (formatPropertyKey, formatJsDoc, generateInterface
with descriptions/non-identifier keys, patchUnknownArrayItems edge cases).
All 506 tests pass; build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d36a2d03-fba6-4c37-b4de-db8855423be3

📥 Commits

Reviewing files that changed from the base of the PR and between 9b2f72a and 24b588d.

📒 Files selected for processing (1)
  • scripts/generate-schema/codegen.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • scripts/generate-schema/codegen.ts

Walkthrough

Adds a new declarative catalog scripts/descriptions.json mapping fully-qualified Riot API type names to interface- and field-level human-readable descriptions; introduces scripts/extract-descriptions.mjs to parse TypeScript JSDoc and emit that JSON; extends schema generation tooling to load descriptions, emit JSDoc into generated TypeScript interfaces, and improve property-key quoting; introduces patchUnknownArrayItems to preserve concrete array item schemas when regenerating; and adds/updates tests covering JSDoc formatting, property-key handling, and the array-item patching behavior.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • feat: schema generation pipeline #4: Related earlier work on the codegen and schema generation pipeline that this changeset extends with descriptions extraction and JSDoc emission.

Poem

🐰
I hopped through types and JSDoc lines,
Collected notes from exported signs,
I stitched them into JSON song,
Made interfaces sing along,
🥕✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(schema): make codegen idempotent so drift detection surfaces only real Riot drift' directly and specifically describes the main objective of the PR: making schema code generation idempotent to avoid false drift detection signals.
Docstring Coverage ✅ Passed Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/generate-schema/codegen.ts`:
- Around line 19-35: The JSDoc for formatPropertyKey incorrectly says keys "get
wrapped in single quotes" while the implementation (and tests) wrap keys in
double quotes; update the comment to state "double quotes" (or "wrapped in
double quotes") and adjust any wording referencing single quotes to mention
double quotes, keeping references to VALID_IDENTIFIER and INTEGER_LITERAL intact
so the doc matches the actual behavior of formatPropertyKey.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 140e1aed-0b34-4dec-b5d8-7f79096d69ef

📥 Commits

Reviewing files that changed from the base of the PR and between d9c12ef and 9b2f72a.

⛔ Files ignored due to path filters (5)
  • packages/whisper/src/types/generated/lol.ts is excluded by !**/generated/**
  • packages/whisper/src/types/generated/lor.ts is excluded by !**/generated/**
  • packages/whisper/src/types/generated/riftbound.ts is excluded by !**/generated/**
  • packages/whisper/src/types/generated/tft.ts is excluded by !**/generated/**
  • packages/whisper/src/types/generated/val.ts is excluded by !**/generated/**
📒 Files selected for processing (7)
  • scripts/descriptions.json
  • scripts/extract-descriptions.mjs
  • scripts/generate-schema/codegen.test.ts
  • scripts/generate-schema/codegen.ts
  • scripts/generate-schema/index.ts
  • scripts/generate-schema/schema.test.ts
  • scripts/generate-schema/schema.ts

Comment thread scripts/generate-schema/codegen.ts
1. scripts/generate-schema/codegen.ts:19-35 — JSDoc on formatPropertyKey
   said keys "get wrapped in single quotes" but implementation (and tests)
   use double quotes.
   Updated comment to say "double quotes" and explain the rationale
   (matches convention already in committed types).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wardbox wardbox merged commit 172555e into main Apr 27, 2026
6 checks passed
@wardbox wardbox deleted the fix/codegen-idempotent-regen branch April 27, 2026 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant