Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 27 additions & 8 deletions crates/invoker-impl/src/quota.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ use std::{

use metrics::{Counter, counter, gauge};

use restate_types::config::Configuration;

use crate::{
InvokerId,
metric_definitions::{
Expand Down Expand Up @@ -51,16 +53,23 @@ pub(super) struct InvokerConcurrencyQuota {
impl InvokerConcurrencyQuota {
pub(super) fn new(invoker_id: impl Into<InvokerId>, quota: Option<NonZeroUsize>) -> Self {
let invoker_id = invoker_id.into();
let invoker_id: Arc<str> = Arc::from(invoker_id.0.to_string());

let invoker_id = invoker_id.0.to_string();
let inner = match quota {
Some(available_slots) => {
gauge!(INVOKER_CONCURRENCY_LIMIT, "invoker_id" => invoker_id.clone())
.set(available_slots.get() as f64);
if Configuration::pinned()
.common
.experimental
.is_vqueues_enabled()
{
// With vqueues, the concurrency is global and shared across all invokers
gauge!(INVOKER_CONCURRENCY_LIMIT).set(available_slots.get() as f64);
} else {
gauge!(INVOKER_CONCURRENCY_LIMIT, "invoker_id" => invoker_id)
.set(available_slots.get() as f64);
Comment on lines +67 to +68
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering whether this metric makes sense to keep if users can no longer calculate the available slots per invoker.

}

let acquired_counter = counter!(INVOKER_CONCURRENCY_SLOTS_ACQUIRED, "invoker_id" => invoker_id.clone());
let released_counter =
counter!(INVOKER_CONCURRENCY_SLOTS_RELEASED, "invoker_id" => invoker_id);
let acquired_counter = counter!(INVOKER_CONCURRENCY_SLOTS_ACQUIRED);
let released_counter = counter!(INVOKER_CONCURRENCY_SLOTS_RELEASED);

InvokerConcurrencyQuotaInner::Limited {
slots: Arc::new(LimitedSlots {
Expand All @@ -71,7 +80,17 @@ impl InvokerConcurrencyQuota {
}
}
None => {
gauge!(INVOKER_CONCURRENCY_LIMIT, "invoker_id" => invoker_id).set(f64::INFINITY);
if Configuration::pinned()
.common
.experimental
.is_vqueues_enabled()
{
// With vqueues, the concurrency is global and shared across all invokers
gauge!(INVOKER_CONCURRENCY_LIMIT).set(f64::INFINITY);
} else {
gauge!(INVOKER_CONCURRENCY_LIMIT, "invoker_id" => invoker_id)
.set(f64::INFINITY);
}

InvokerConcurrencyQuotaInner::Unlimited
}
Expand Down
18 changes: 15 additions & 3 deletions release-notes/unreleased/invoker-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,21 @@ The following metric has been dropped:

## Added metrics
Two new counter metrics have been added as replacements:
- `restate.invoker.concurrency_slots.acquired` (counter)
- `restate.invoker.concurrency_slots.released` (counter)
- `restate.invoker.concurrency_slots.acquired` (counter) - cumulative per node
- `restate.invoker.concurrency_slots.released` (counter) - cumulative per node

These counters make it easy to derive:
- Rate of slot acquisition and release
- Available slots: `restate.invoker.concurrency_limit - (restate.invoker.concurrency_slots.acquired - restate.invoker.concurrency_slots.released)`
- Node-level available slots, grouping by the node label exposed by your Prometheus setup, for example `node_name` with the built-in exporter:
```promql
sum by (node_name) (restate_invoker_concurrency_limit)
- (
sum by (node_name) (restate_invoker_concurrency_slots_acquired_total)
- sum by (node_name) (restate_invoker_concurrency_slots_released_total)
)
```
If your Prometheus setup exposes a `node_id` label, use `sum by (node_id)` instead.
This aggregation also removes any remaining `invoker_id` label from `restate.invoker.concurrency_limit` in configurations where it is still present.

## Future breaking observability change
In Restate v1.8.0, the `invoker_id` label will be removed from `restate.invoker.concurrency_limit`, so this metric will always be reported at node scope. Update dashboards, alerts, and recording rules to stop grouping or filtering `restate.invoker.concurrency_limit` by `invoker_id`; group by the node label instead.
Loading