[metrics] Emit concurrency_slot counters as aggregates by AhmedSoliman · Pull Request #4738 · restatedev/restate

AhmedSoliman · 2026-05-14T08:54:05Z

With VQueues enabled, the concurrency limit is per-node level so we remove the extra label (invoker_id)
Removing invoker_id from concurrency_slot.acquired/released metrics to reduce cardinality and to be forward compatible with vqueues

Stack created with Sapling. Best reviewed with ReviewStack.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 00c49f1e56

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-14T09:18:48Z

+                                        &entry_id.to_invocation_id(qid.partition_key())
+                                        .expect("This version does not support yielding vqueues entries other than invocations"),


Avoid panicking on yielded state-mutation entries

SchedulerAction::Yield now unconditionally converts yield_action.key.entry_id() to an InvocationId with expect(...), but the scheduler can yield any running head, including EntryKind::StateMutation entries (see attempt_to_run handling both invocation and state mutation kinds). In that case to_invocation_id(...) returns None and this expect will panic the partition state machine, turning a recoverable scheduling condition into a crash.

Useful? React with 👍 / 👎.

Not true in this version. State mutations will never be observed as running (we transition to finished in the same write batch as we start running them). Also, invoker never sees them.

github-actions · 2026-05-14T09:36:41Z

Test Results

5 files - 3 5 suites - 3 2m 5s ⏱️ - 2m 44s
35 tests - 25 35 ✅ - 25 0 💤 ±0 0 ❌ ±0
56 runs - 211 56 ✅ - 211 0 💤 ±0 0 ❌ ±0

Results for commit a50ccee. ± Comparison against base commit 5ea7c39.

This pull request removes 60 and adds 35 tests. Note that renamed tests count towards both.

dev.restate.sdktesting.tests.CallOrdering ‑ ordering(boolean[], Client)[1]
dev.restate.sdktesting.tests.CallOrdering ‑ ordering(boolean[], Client)[2]
dev.restate.sdktesting.tests.CallOrdering ‑ ordering(boolean[], Client)[3]
dev.restate.sdktesting.tests.Cancellation ‑ cancelFromAdminAPI(BlockingOperation, Client, URI)[1]
dev.restate.sdktesting.tests.Cancellation ‑ cancelFromAdminAPI(BlockingOperation, Client, URI)[2]
dev.restate.sdktesting.tests.Cancellation ‑ cancelFromAdminAPI(BlockingOperation, Client, URI)[3]
dev.restate.sdktesting.tests.Cancellation ‑ cancelFromContext(BlockingOperation, Client)[1]
dev.restate.sdktesting.tests.Cancellation ‑ cancelFromContext(BlockingOperation, Client)[2]
dev.restate.sdktesting.tests.Cancellation ‑ cancelFromContext(BlockingOperation, Client)[3]
dev.restate.sdktesting.tests.Combinators ‑ awakeableOrTimeoutUsingAwaitAny(Client)
…

dev.restate.sdktesting.tests.AwakeableIngressEndpointTest ‑ completeWithFailure(Client)
dev.restate.sdktesting.tests.AwakeableIngressEndpointTest ‑ completeWithSuccess(Client)
dev.restate.sdktesting.tests.AwakeableLeaderTransferTest ‑ awakeableCompletionsAreNotLostDuringLeaderTransfer(Client, ContainerHandle)
dev.restate.sdktesting.tests.BackwardCompatibilityTest$NewVersion ‑ completeAwakeable(Client)
dev.restate.sdktesting.tests.BackwardCompatibilityTest$NewVersion ‑ completeRetryableOperation(Client)
dev.restate.sdktesting.tests.BackwardCompatibilityTest$NewVersion ‑ proxyCallShouldBeDone(Client)
dev.restate.sdktesting.tests.BackwardCompatibilityTest$NewVersion ‑ proxyOneWayCallShouldBeDone(Client)
dev.restate.sdktesting.tests.BackwardCompatibilityTest$OldVersion ‑ createAwakeable(Client)
dev.restate.sdktesting.tests.BackwardCompatibilityTest$OldVersion ‑ startOneWayProxyCall(Client)
dev.restate.sdktesting.tests.BackwardCompatibilityTest$OldVersion ‑ startProxyCall(Client)
…

♻️ This comment has been updated with latest results.

muhamadazmy

👍🏼

tillrohrmann

Thanks for future proofing the invoker metrics @AhmedSoliman. I had one question regarding keeping the per invoker concurrency limit in the presence of aggregate acquire and release metrics. Apart from that, +1 for merging :-)

tillrohrmann · 2026-05-18T10:40:17Z


 These counters make it easy to derive:
 - Rate of slot acquisition and release
 - Available slots: `restate.invoker.concurrency_limit - (restate.invoker.concurrency_slots.acquired - restate.invoker.concurrency_slots.released)`


This probably makes no longer sense if vqueues are disabled (limits are being reported per invoker id) and we are exposing the acquired and released concurrency slots metric as an aggregate.

Right now, users won't be able to calculate the available slots per invoker if vqueues are disabled.

tillrohrmann · 2026-05-18T10:42:56Z

+                    gauge!(INVOKER_CONCURRENCY_LIMIT, "invoker_id" => invoker_id)
+                        .set(available_slots.get() as f64);


Wondering whether this metric makes sense to keep if users can no longer calculate the available slots per invoker.

Full implementation for yielding invocations from the invoker back to the vqueues scheduler. This unifies with the existing support for memory-budget-based invocation yielding.

- With VQueues enabled, the concurrency limit is per-node level so we remove the extra label (invoker_id) - Removing invoker_id from concurrency_slot.acquired/released metrics to reduce cardinality and to be forward compatible with vqueues

AhmedSoliman mentioned this pull request May 14, 2026

[Yield] Support invocations to yield on transient errors #4725

Open

AhmedSoliman force-pushed the pr4738 branch 2 times, most recently from a7b410f to 00c49f1 Compare May 14, 2026 09:10

AhmedSoliman marked this pull request as ready for review May 14, 2026 09:12

chatgpt-codex-connector Bot reviewed May 14, 2026

View reviewed changes

AhmedSoliman force-pushed the pr4738 branch 3 times, most recently from 4fa9c1b to f3d79db Compare May 14, 2026 10:24

AhmedSoliman mentioned this pull request May 14, 2026

[VQueues] Remove use of vobj_status on invoke path #4740

Draft

AhmedSoliman force-pushed the pr4738 branch from f3d79db to f7414a8 Compare May 15, 2026 07:28

AhmedSoliman mentioned this pull request May 15, 2026

Upgrade rocksdb to 0.11.1 and rust bindings to 0.48.0 #4743

Merged

AhmedSoliman force-pushed the pr4738 branch 4 times, most recently from 5f322cd to 686642c Compare May 18, 2026 08:06

AhmedSoliman requested review from muhamadazmy and tillrohrmann May 18, 2026 08:28

muhamadazmy approved these changes May 18, 2026

View reviewed changes

AhmedSoliman mentioned this pull request May 18, 2026

[VQueues] Increase successive-merges limit and remove wasteful partial merges #4747

Draft

tillrohrmann reviewed May 18, 2026

View reviewed changes

AhmedSoliman mentioned this pull request May 18, 2026

[Perf] Support fixed encoding for vqueue and other common types #4749

Draft

AhmedSoliman added 2 commits May 18, 2026 12:38

[Yield] Support invocations to yield on transient errors

96b03fe

Full implementation for yielding invocations from the invoker back to the vqueues scheduler. This unifies with the existing support for memory-budget-based invocation yielding.

AhmedSoliman force-pushed the pr4738 branch from 686642c to a50ccee Compare May 18, 2026 11:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[metrics] Emit concurrency_slot counters as aggregates#4738

[metrics] Emit concurrency_slot counters as aggregates#4738
AhmedSoliman wants to merge 2 commits into
mainfrom
pr4738

AhmedSoliman commented May 14, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 14, 2026

Uh oh!

AhmedSoliman May 18, 2026

Uh oh!

github-actions Bot commented May 14, 2026 •

edited

Loading

Uh oh!

muhamadazmy left a comment

Uh oh!

tillrohrmann left a comment

Uh oh!

tillrohrmann May 18, 2026

Uh oh!

tillrohrmann May 18, 2026

Uh oh!

tillrohrmann May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		&entry_id.to_invocation_id(qid.partition_key())
		.expect("This version does not support yielding vqueues entries other than invocations"),

		gauge!(INVOKER_CONCURRENCY_LIMIT, "invoker_id" => invoker_id)
		.set(available_slots.get() as f64);

Conversation

AhmedSoliman commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

AhmedSoliman May 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

muhamadazmy left a comment

Choose a reason for hiding this comment

Uh oh!

tillrohrmann left a comment

Choose a reason for hiding this comment

Uh oh!

tillrohrmann May 18, 2026

Choose a reason for hiding this comment

Uh oh!

tillrohrmann May 18, 2026

Choose a reason for hiding this comment

Uh oh!

tillrohrmann May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AhmedSoliman commented May 14, 2026 •

edited

Loading

github-actions Bot commented May 14, 2026 •

edited

Loading