Skip to content

fix: make recording completion idempotent#1836

Open
richiemcilroy wants to merge 1 commit into
mainfrom
upload-reliability-finalization
Open

fix: make recording completion idempotent#1836
richiemcilroy wants to merge 1 commit into
mainfrom
upload-reliability-finalization

Conversation

@richiemcilroy
Copy link
Copy Markdown
Member

@richiemcilroy richiemcilroy commented May 18, 2026

Greptile Summary

This PR makes the recording-completion flow idempotent end-to-end. The server now claims the mux slot before generating presigned URLs (reducing wasted work on concurrent requests), returns HTTP 200 with typed alreadyProcessing/alreadyComplete responses instead of 409 for duplicate calls, and the Rust desktop client is updated to treat the legacy 409 "Muxing already in progress" response as a success for backward compatibility with older server deployments.

  • Server (recording-complete.ts): Claim step moved before the expensive presigned-URL/manifest work; getRecordingCompleteIdempotentResult handles already-processing and already-complete states gracefully, with a nested insert path and race-condition re-check for the no-prior-record case.
  • Client (api.rs, upload.rs): 409 with the old error message is now silently treated as success; the last retry error is now appended to the terminal failure message.
  • Utilities & tests (recording-complete-utils.ts, *.test.ts): Shared phase logic extracted into a typed utility module with unit test coverage.

Confidence Score: 4/5

The change is safe to merge; the idempotency logic is well-structured and the existing test suite covers the new utility functions.

The core idempotency logic is sound: claim-before-work is the correct ordering, concurrent-request races are handled at multiple checkpoints, and the catch block correctly resets the phase to error on failure. Two minor rough edges exist — a hard-coded server error message string in the Rust backward-compat check, and an opaque 500 branch reachable via a narrow race condition — but neither affects correctness in normal operation.

The else-branch 500 path in recording-complete.ts (line 132) and the string-match guard in api.rs are worth a second look, but they do not block the change.

Important Files Changed

Filename Overview
apps/desktop/src-tauri/src/api.rs Adds backward-compat handling: HTTP 409 "Muxing already in progress" is now treated as success. The string match is brittle but only bridges older server versions.
apps/desktop/src-tauri/src/upload.rs Stores the last signal error and appends it to the final failure message; clean improvement to error observability.
apps/web/app/api/upload/[...route]/recording-complete-utils.ts New utility file extracting phase type, idempotent result logic, and source-type predicates; well-typed and straightforward.
apps/web/tests/unit/recording-complete-utils.test.ts Unit tests cover the idempotent-result helper, unclaimable phase list, and both source-type predicates; good coverage of the new utilities.
apps/web/app/api/upload/[...route]/recording-complete.ts Claim step moved before presigned-URL generation and the idempotent-result path is now checked at multiple concurrency boundaries; logic is sound but the else-500 branch on line 132 is opaque.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
apps/desktop/src-tauri/src/api.rs:342-348
**Brittle backward-compat string match**

The check `error_body.contains("Muxing already in progress")` hard-codes the exact server error message. If that message is ever edited or the server returns a localized/structured JSON body instead of plain text, the condition silently stops matching: the response is treated as an ordinary error, the client retries three times, and the upload fails. Since the updated server now returns HTTP 200 for this case, this path is only exercised against older deployments — but it would be worth at minimum adding a comment that documents what server version this bridges, so a future maintainer knows when it can be removed.

### Issue 2 of 2
apps/web/app/api/upload/[...route]/recording-complete.ts:131-133
**Unreachable-in-theory branch returns an opaque 500**

This `else` arm fires only when the `UPDATE` returned 0 affected rows yet the follow-up `SELECT` found a row whose phase is neither idempotent (`processing` / `generating_thumbnail` / `complete`) nor absent. In normal operation that means the row transitioned from `processing``error` between the UPDATE and the SELECT (a tight race with the media-server webhook). Because `error` is claimable, the Rust client's next retry attempt would successfully claim and re-mux — but the 500 response here is indistinguishable from a genuine server failure. Adding a log with the actual `existing.phase` would make this edge case much easier to diagnose in production.

Reviews (1): Last reviewed commit: "fix: make recording completion idempoten..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

@richiemcilroy richiemcilroy marked this pull request as ready for review May 18, 2026 13:44
@superagent-security superagent-security Bot added contributor:verified Contributor passed trust analysis. pr:verified PR passed security analysis. labels May 18, 2026
Comment on lines +342 to +348
let status = resp.status();
let error_body = resp
.text()
.await
.unwrap_or_else(|_| "<no response body>".to_string());
if status == reqwest::StatusCode::CONFLICT
&& error_body.contains("Muxing already in progress")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Brittle backward-compat string match

The check error_body.contains("Muxing already in progress") hard-codes the exact server error message. If that message is ever edited or the server returns a localized/structured JSON body instead of plain text, the condition silently stops matching: the response is treated as an ordinary error, the client retries three times, and the upload fails. Since the updated server now returns HTTP 200 for this case, this path is only exercised against older deployments — but it would be worth at minimum adding a comment that documents what server version this bridges, so a future maintainer knows when it can be removed.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src-tauri/src/api.rs
Line: 342-348

Comment:
**Brittle backward-compat string match**

The check `error_body.contains("Muxing already in progress")` hard-codes the exact server error message. If that message is ever edited or the server returns a localized/structured JSON body instead of plain text, the condition silently stops matching: the response is treated as an ordinary error, the client retries three times, and the upload fails. Since the updated server now returns HTTP 200 for this case, this path is only exercised against older deployments — but it would be worth at minimum adding a comment that documents what server version this bridges, so a future maintainer knows when it can be removed.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +131 to +133
} else {
return c.json({ error: "Failed to claim recording complete" }, 500);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unreachable-in-theory branch returns an opaque 500

This else arm fires only when the UPDATE returned 0 affected rows yet the follow-up SELECT found a row whose phase is neither idempotent (processing / generating_thumbnail / complete) nor absent. In normal operation that means the row transitioned from processingerror between the UPDATE and the SELECT (a tight race with the media-server webhook). Because error is claimable, the Rust client's next retry attempt would successfully claim and re-mux — but the 500 response here is indistinguishable from a genuine server failure. Adding a log with the actual existing.phase would make this edge case much easier to diagnose in production.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/web/app/api/upload/[...route]/recording-complete.ts
Line: 131-133

Comment:
**Unreachable-in-theory branch returns an opaque 500**

This `else` arm fires only when the `UPDATE` returned 0 affected rows yet the follow-up `SELECT` found a row whose phase is neither idempotent (`processing` / `generating_thumbnail` / `complete`) nor absent. In normal operation that means the row transitioned from `processing``error` between the UPDATE and the SELECT (a tight race with the media-server webhook). Because `error` is claimable, the Rust client's next retry attempt would successfully claim and re-mux — but the 500 response here is indistinguishable from a genuine server failure. Adding a log with the actual `existing.phase` would make this edge case much easier to diagnose in production.

How can I resolve this? If you propose a fix, please make it concise.

.text()
.await
.unwrap_or_else(|_| "<no response body>".to_string());
if status == reqwest::StatusCode::CONFLICT
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String-matching the error body feels a bit brittle here (any wording change would make desktop start erroring again). If 409 is only used to signal the idempotent "already processing" case, I’d treat all 409s as OK.

Suggested change
if status == reqwest::StatusCode::CONFLICT
if status == reqwest::StatusCode::CONFLICT {
return Ok(());
}

"processing",
"generating_thumbnail",
"complete",
] satisfies RecordingCompletePhase[];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small TS typing nit: satisfies validates the elements, but the inferred type here is still string[]. Adding as const keeps it a readonly tuple of literal values.

Suggested change
] satisfies RecordingCompletePhase[];
export const RECORDING_COMPLETE_UNCLAIMABLE_PHASES = [
"processing",
"generating_thumbnail",
"complete",
] as const satisfies readonly RecordingCompletePhase[];

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor:verified Contributor passed trust analysis. pr:verified PR passed security analysis.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant