connection-pool: evict idle connections instead of entire authority pools#4728
connection-pool: evict idle connections instead of entire authority pools#4728muhamadazmy wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9459b52464
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let (size, evicted) = | ||
| pool.retain(|con| now.duration_since(con.last_used_at()) < idle_timeout); |
There was a problem hiding this comment.
Avoid evicting active streams as idle
When a response body stays in flight longer than idle_timeout, the connection's last_used_at was only updated when the request was started (Connection::request), but this predicate evicts solely by that timestamp. In that scenario the connection is moved out of the active pool while inflight() > 0 and then dropped as soon as the stream completes, even though it was never idle for the timeout; long-lived responses/streams will therefore force reconnects after each request. Consider retaining in-flight connections here or resetting the idle timestamp when the in-flight count reaches zero.
Useful? React with 👍 / 👎.
| /// | ||
| /// Default: 5 minutes | ||
| pub idle_pool_timeout: Option<NonZeroFriendlyDuration>, | ||
| pub idle_connection_timeout: Option<NonZeroFriendlyDuration>, |
There was a problem hiding this comment.
Add Since metadata to the new config key
This public config option replaces idle_pool_timeout, but its doc comment does not include the required /// Since vX.Y.Z marker from the root AGENTS.md config-option guideline. Without that marker, generated config docs/schema lose the version provenance for this newly exposed key; please add the Since line and include migration/deprecation documentation if the old key is intentionally removed.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Adding since version would be great.
tillrohrmann
left a comment
There was a problem hiding this comment.
Thanks a lot for creating this PR @muhamadazmy. The changes look good to me :-) I had a clarifying question about the necessity of the draining task-local list of the eviction task.
| /// How long a connection can be idle before it is evicted from the | ||
| /// pool. `None` disables eviction entirely. Defaults to 5 minutes. | ||
| pub(crate) idle_authority_timeout: Option<Duration>, | ||
| pub(crate) idle_connection_timeout: Option<Duration>, |
There was a problem hiding this comment.
Is it possible to set None value via toml in the configuration?
There was a problem hiding this comment.
Ah, right! While this is the pool config (not restate config) but the same comment applies.
Maybe use 0 instead of None as no idle timeout ?
There was a problem hiding this comment.
Ah true. Yeah 0 for disabling eviction makes sense to me.
| /// | ||
| /// Default: 5 minutes | ||
| pub idle_pool_timeout: Option<NonZeroFriendlyDuration>, | ||
| pub idle_connection_timeout: Option<NonZeroFriendlyDuration>, |
There was a problem hiding this comment.
Adding since version would be great.
| /// Evicted connections are not dropped immediately; they are moved into a | ||
| /// task-local drain list. This is a safety net for a race in | ||
| /// [`AuthorityPool::poll_ready`]: a caller can clone a `Connection` under the | ||
| /// pool's read lock and then acquire a stream permit on it *after* the lock | ||
| /// is released. If the eviction task has grabbed the write lock in between | ||
| /// and removed the connection, the late-arriving request still has a live | ||
| /// H2 handle to use. On each subsequent tick the drain list is filtered |
There was a problem hiding this comment.
What happens if the connection gets dropped from the draining task-local list but AuthorityPool is in ready state with the "dropped" connection stored there but not yet used for the request. Wouldn't we still have a valid h2 handle which we can use to create the request?
There was a problem hiding this comment.
A connection in a ready state already has inflight count incremented. This means that:
- Eviction task already see this new inflight count, so it retain the connection and doesn't move it to draining
- Eviction task races an authority pool
poll_readybut it doesn't see the inflight() > 0. And that is why we move the connection to draining.
There was a problem hiding this comment.
I was thinking about the state where we didn't acquire the permit yet but have cloned Connection. So it should be 2.
| // Drop drained connections whose in-flight streams have all completed. | ||
| draining.retain(|con| con.inflight() != 0); |
There was a problem hiding this comment.
Why is it important to keep the draining connections stored in this list here? I am wondering about the following case:
An idle connection gets picked by the AuthorityPool to be polled for availability. The eviction task moves this connection into draining because it was idle for too long. Then the eviction task decides to drop this connection from draining because there are no in flight connection attempts. Then we poll the AuthorityPool and get the dropped connection back as ready and send the request on it. What would happen?
If this is perfectly fine, is the draining task-local list necessary?
There was a problem hiding this comment.
An inflight stream doesn't hold a strong ref to the connection, which means if all strong refs (Arcs) to a connections are dropped, the stream will get interrupted and fail in response.
The draining list is a place to hold the Arcs of draining connections until we make sure there are no inflight streams left on that connections before we fully drop it. This is basically to avoid race with authority pool poll_ready function (in case evictions task moved a connection to draining while a poll_ready() is concurrently polling the connection for a permit).
Moving the connection to the draining list guarantees that newer requests will not poll this connection anymore while at the same time allows the ones that was racing with the eviction task to still be able to use it (and keep it open) until it's done.
There was a problem hiding this comment.
Ok, and the idea is that (idle_timeout / 4).max(Duration::from_secs(10)) is always larger than what it takes another thread to poll a cloned connection to be ready (via the AuthorityPool), right?
There was a problem hiding this comment.
I was thinking it was a fair assumption. But I totally understand your concern. If the authority pool is cloned then polled once and then wasn't polled again until the cloned inner connections have been evicted, we will get into the situation you are describing.
The solution i was trying to avoid was to hold an Arc to the connection inside the response stream, this way we can completely drop the eviction list and just wait for the connection to terminate on the last stream.
I was trying to avoid holding the Arc in memory longer than necessary but maybe that was premature optimisation.
There was a problem hiding this comment.
To be fair, I think in all practical terms this solution will work (as we should poll faster than every 10s).
How much longer are we going to hold the Arc to the connection if it is stored as part of the response stream? Conceptually it makes sense to me that we are keeping the connection open as long as the response stream is active/being used.
…ools Summary: Previously, the eviction task dropped an authority pool once it had been idle for longer than the configured timeout, taking all of its connections with it. This commit switches to a per-connection policy: connections that have been idle for longer than the timeout are evicted individually, and the authority pool entry is removed only once it has no remaining connections. Evicted connections are not drained immediately but only after the last stream that is holding an Arc to the underlying connection is dropped. Only then the connection is closed. This is to avoid racing with AuthorityPool::poll_ready. As a result, a connection may stay alive longer than the configured idle_connection_timeout if it still has in-flight work. The corresponding config knob is renamed from `idle_pool_timeout` to `idle_connection_timeout` to reflect the new semantics; the internal `idle_authority_timeout` field on `PoolConfig` is renamed to match.
|
Dear @tillrohrmann thank you so much for your leading and valuable inputs. I was trying to avoid creating a strong ref to the connection from our RecvStream, which complicated the eviction process unnecessary. I think it's cleaner now that the connection is closed only when the last stream holding the connection is dropped (and the connection has been evicted). This way even if the eviction task evicted the connection, any authority pool that is still holding this connection as a candidate should be able to use it to completion. |
|
Thanks for the clarification @muhamadazmy. Quick question on the change and the original idea of avoiding holding a strong ref to the connection from the |
tillrohrmann
left a comment
There was a problem hiding this comment.
Thanks for updating this PR @muhamadazmy. LGTM :-) I only had a question about your original motives to avoid keeping an arc of the connection in the RecvStream: What were you worried about in terms of memory occupation? Apart from that, +1 for merging :-).
connection-pool: evict idle connections instead of entire authority pools
Summary:
Previously, the eviction task dropped an authority pool once it had
been idle for longer than the configured timeout, taking all of its
connections with it. This commit switches to a per-connection policy:
connections that have been idle for longer than the timeout are
evicted individually, and the authority pool entry is removed only
once it has no remaining connections.
Evicted connections are not drained immediately but only
after the last stream that is holding an Arc to the underlying
connection is dropped. Only then the connection is closed.
This is to avoid racing with AuthorityPool::poll_ready.
As a result, a connection may stay alive longer than the configured
idle_connection_timeout if it still has in-flight work.
The corresponding config knob is renamed from
idle_pool_timeouttoidle_connection_timeoutto reflect the new semantics; the internalidle_authority_timeoutfield onPoolConfigis renamed to match.