feat(interceptor): Adds direct-pod routing for cold starts to reduce latency when kube-proxy rule propagation is slow. by AtharvaPakade · Pull Request #1585 · kedacore/http-add-on

AtharvaPakade · 2026-04-18T20:36:17Z

Key changes:

New KEDA_HTTP_DIRECT_POD_ROUTING environment variable

Adds DirectPodRoutingMode type with two values: disabled (default) and cold-start-only.
On cold start, the interceptor waits for a pod to become ready then rewrites the upstream URL directly to that pod's ip:port, bypassing the service ClusterIP and avoiding latency from kube-proxy rules that are slow to propagate after scale-from-zero.

ReadyEndpointsCache now tracks (ip, port) pairs

WaitForReady now returns a podHost (ip:port string) in addition to the isColdStart bool. Stores ready endpoint state as a map of serviceKey → {ip, ports-by-name} backed by EndpointSlice updates, replacing the previous bool-only ready flag.
Port lookup is keyed on the named port from the InterceptorRoute's PortName field, so multi-port pods route to thecorrect container port.
Returns podHost="" when PortName has no match, leaving the upstream URL unchanged.

Cold-start URL rewrite in EndpointResolver

When DirectPodOnColdStart=true, isColdStart=true, and podHost!="", the upstream URL's Host is replaced with the pod ip:port.
The rewrite is skipped when the request falls back to an alternate upstream — the fallback URL is not overwritten.
For HTTPS fallback upstreams, the TLS server name context is updated to the fallback hostname.

TLS SNI preservation

Routing middleware now stores the intended TLS server name in context before any downstream middleware can rewrite the upstream URL.
TransportPool.Get now takes a serverName argument and keys transports on (responseHeaderTimeout, serverName), applying tls.Config.ServerName per transport.
Upstream handler passes util.UpstreamServerNameFromContext to transportPool.Get.

Checklist

Commits are signed with Developer Certificate of Origin (DCO)
Changelog has been updated and is aligned with our changelog requirements
Any necessary documentation is added, such as:

Fixes #1473

snyk-io · 2026-04-18T20:36:25Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scan Engine	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Copilot

Pull request overview

Adds an opt-in “direct-to-pod” routing mode during cold starts to avoid initial request failures caused by delayed kube-proxy/ClusterIP rule sync, and reworks the ready-endpoints cache to retain richer EndpointSlice-derived state (ready pod IPs + port name resolution).

Changes:

Introduce KEDA_HTTP_DIRECT_POD_ON_COLD_START and plumb it through the interceptor handler chain.
Rework ReadyEndpointsCache to store an immutable per-service snapshot (ready IPs + portName → port map) and add PickReadyEndpoint.
Add/update unit tests for port-name resolution and cold-start direct-to-pod URL rewriting.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`pkg/k8s/ready_endpoints_cache.go`	Stores per-service snapshot state and adds `PickReadyEndpoint` selection logic.
`pkg/k8s/ready_endpoints_cache_test.go`	Expands tests to cover endpoint picking and port extraction/dedup behavior.
`interceptor/middleware/endpoint_resolver.go`	Rewrites upstream URL to pod IP:port on cold start when enabled.
`interceptor/middleware/endpoint_resolver_test.go`	Adds middleware tests validating direct-to-pod behavior and port-name selection.
`interceptor/proxy.go`	Wires the new config flag into the middleware config.
`interceptor/config/serving.go`	Adds the new env-var-backed serving configuration flag.

Comments suppressed due to low confidence (1)

pkg/k8s/ready_endpoints_cache.go:40

The notifyCh comment says it is closed "on any change", but Update now only calls broadcast() when readiness transitions (or when slices are deleted). Updating the comment to match the actual broadcast semantics would prevent future readers from assuming every EndpointSlice update will wake waiters.

	// Broadcast mechanism: the channel is closed on any change,
	// then replaced with a fresh one. Waiters select on the channel.
	mu       sync.Mutex
	notifyCh chan struct{}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

linkvt · 2026-04-23T12:48:40Z

+	// TLSUpstreamDisabled disables TLS for upstream connections even when the proxy server
+	// itself has TLS enabled. When true, requests are forwarded to upstream pods over plain
+	// HTTP regardless of the proxy TLS setting.
+	TLSUpstreamDisabled bool `env:"KEDA_HTTP_PROXY_TLS_UPSTREAM_DISABLED" envDefault:"false"`


Continuing here from our chat in Slack:

As far as I understand whether TLS is used is not relevant, we just change the target the traffic is sent to. When TLS is used we can if I understand it correctly set the tlsConfig.ServerName to the service name while still sending our request to the Endpoint IP, see https://pkg.go.dev/crypto/tls#Config

// ServerName is used to verify the hostname on the returned // certificates unless InsecureSkipVerify is given. It is also included // in the client's handshake to support virtual hosting unless it is // an IP address. ServerName string

@linkvt
Correct, we can use ServerName (it will be the SNI) which can make the direct to pod IP TLS handshake successful.

I have a few implementations in mind. Can you help me figure out which one I should proceed with?

A. The base transport is built at startup, and the upstream handler uses it to seed a pool keyed by responseHeaderTimeout. Should we extend that same pool to key by (timeout, serverName) combinations? This increases pool cardinality, and since serverName-aware transports are only needed during cold start — is that trade-off worth it?

B. Rather than extending the existing pool, have a dedicated cold start transport pool keyed by (timeout, serverName). This keeps the hot path unchanged and scopes the cardinality growth to cold start scenarios only.

C. Use a temporary, non-pooled transport for cold start requests — created on demand, not reused. This leaves the existing pool entirely untouched, but introduces transport selection logic in the upstream handler to switch between pooled (normal) and ephemeral (cold start) transports.

Thanks for looking into this, I didn't consider that it would be more complicated 😅
So I think that means we need to reconsider the current TransportPool implementation, A sounds IMO good, we also don't need it just for the cold-start but for all connections I think?
The direct pod routing is useful for cold-starts where syncing the Services ClusterIP might take some time, but can also be used later as well for non cold-start requests.

Pool size is something we could consider now or later in a follow-up PR, the pool will definitely be larger but it's just a few KBs anyway IIRC.

we also don't need it just for the cold-start but for all connections I think?

For this PR, IMO we should only optimize direct-to-pod for cold start request.
But once we have seperate PR which detect readiness based by probing, like I mentioned in this issue comment then we can consider this optimization for all requests.

Additionally, i will also drop this KEDA_HTTP_PROXY_TLS_UPSTREAM_DISABLED flag. I think we also need a PR that give ability to configure the Upstream TLS config . Because currently we use the same TLS config for upstream. Having this can also give user ability for mTLs with upstream pods.

Couldn't we just use/modify the current

isColdStart, err := er.readyCache.WaitForReady(waitCtx, serviceKey)

to also return an endpoint IP?
We would not do any extra work at all then and could use the endpoint IP in both cases?

Using WaitForReady makes sense only when we want to also do direct-to-pod for non cold start request as well.
@linkvt Are you proposing that we do direct-to-pod for all request in this PR itself ?

Yes I thought that would be easier, no special handling for cold start, always take the endpoint IP from our readyCache. Add a flag to opt-out (nobody does opt-in usually) to fallback to the non endpoint upstream.
WaitForReady is IIRC called in both cold and non coldstart scenarios and would be an ideal place to change?

also wdyt @Fedosin ?

Fedosin

Review

Design-Level Feedback

1. KEDA_HTTP_PROXY_TLS_UPSTREAM_DISABLED should be dropped (agree with @linkvt)

This flag globally disables TLS to upstream for all requests, not just cold-start direct-pod ones. That's a blunt instrument with security implications. The proper solution is to set tlsConfig.ServerName to the original service DNS name when dialing a pod IP directly. That way TLS works correctly without needing a kill-switch. I'd suggest either:

Handling TLS via ServerName in this PR, or
Deferring TLS+direct-pod to a follow-up and only enabling direct-pod for non-TLS setups here.

2. Scope: cold-start-only vs. all requests

@linkvt asked for my opinion. I think starting with cold-start-only is the right incremental approach: it has a clear problem statement (kube-proxy lag) and limits blast radius. However, the API surface should be designed with the eventual "all requests" path in mind. Consider whether the flag name KEDA_HTTP_DIRECT_POD_ON_COLD_START locks us in, or if a more general name (e.g. KEDA_HTTP_DIRECT_POD_ROUTING with mode values like cold-start-only / always / disabled) would be better.

That said, keeping the current flag and adding a separate one later is acceptable too.

3. WaitForReady could return endpoint info (agree with @linkvt)

Since WaitForReady already blocks until the service has ready endpoints, it's the natural point where we know a pod is ready. Having it also return a candidate endpoint would:

Eliminate the TOCTOU gap between "became ready" and PickReadyEndpoint
Simplify the endpoint resolver middleware (no separate PickReadyEndpoint call)
Enable future "always direct-pod" mode trivially

This doesn't need to happen in this PR, but would be a cleaner design for the next iteration.

Code-Level Issues

4. crypto/rand is overkill for load-balancing selection

idx, err := rand.Int(rand.Reader, big.NewInt(int64(len(candidates))))
if err != nil {
    return "", 0, false
}

Using crypto/rand + big.Int for a simple random index in a hot path adds unnecessary overhead (syscall + heap allocation per call). math/rand/v2 (already used in this repo) with its auto-seeded global generator is sufficient for load-balancing randomness. The crypto/rand error path also silently falls back to ClusterIP, which could mask a real system issue (exhausted entropy).

5. The Update hot path has a subtle ordering concern

if v, ok := c.states.Load(serviceKey); ok {
    ptr := v.(*atomic.Pointer[serviceState])
    oldState := ptr.Load()
    ptr.Store(newState)
    ...
}

If two concurrent Update calls race on the same key, the non-atomic Load-then-Store means the second writer's state could silently overwrite the first's. In practice informer event handlers are serialized per-resource, so this is likely safe today, but it's fragile. A CompareAndSwap loop or brief per-key lock would be more robust and future-proof.

6. PickReadyEndpoint returns ok=false when portName="" and all ports are named

This means users who configure their InterceptorRoute with a numeric Port (resulting in PortName="") will never get direct-pod routing even when the feature is enabled. This limitation should be documented clearly so users know they must use named ports for direct-pod routing to activate.

7. schemeHTTPS / schemeHTTP constants placement

These constants are defined in routing.go but referenced from endpoint_resolver.go (same package, so it compiles). Consider placing shared constants in a dedicated file or at the package level to make the dependency explicit and avoid confusion when reading either file in isolation.

Minor Nits

The startup validation in main.go becomes unnecessary once TLSUpstreamDisabled is dropped. If you keep it, note that runtime.Goexit() after logging is consistent with the rest of main.go but a brief comment explaining why (deferred cleanup) would help future readers.
Missing changelog entry and documentation (per the unchecked checklist items).

What's Good

The serviceState immutable snapshot approach is well-designed: atomic pointer swap gives readers lock-free access without contention.
The (ip, port) pairing per-slice in collectServiceState correctly handles rolling deploys where different slices expose different port numbers for the same port name.
Deduplication logic prevents the same (ip, port) pair from appearing multiple times across slices.
Test coverage is thorough: multi-port, unnamed port, empty port name, heterogeneous ports, and the rolling-deploy scenario are all exercised.
Graceful fallback to ClusterIP when PickReadyEndpoint returns ok=false is safe and correct.

Summary

The core idea and cache refactoring are solid. Main blockers before merge:

Resolve the TLS approach (drop TLSUpstreamDisabled, use ServerName or scope to non-TLS only).
Switch crypto/rand to math/rand/v2.
Add changelog + document the named-port requirement.

- Adds KEDA_HTTP_DIRECT_POD_ROUTING (disabled|cold-start-only); when set, rewrites upstream URL to a ready pod's ip:port after scale-from-zero - ReadyEndpointsCache now tracks (ip, port) pairs keyed by named port via EndpointSlice updates; WaitForReady returns a podHost alongside isColdStart - Empty podHost (no portName match) leaves upstream URL unchanged, falling back to ClusterIP as before - TransportPool keyed on (responseHeaderTimeout, serverName) with per-transport TLSClientConfig.ServerName for correct SNI per pod - Routing middleware stores intended TLS hostname in context before any URL rewrite so SNI always points to the original service Signed-off-by: Atharva Pakade <pakade310@gmail.com>

AtharvaPakade · 2026-05-04T05:56:51Z

Hi @linkvt, @Fedosin, thanks for having a look at the PR! I've pushed the changes based on your suggestions. Please let me know if there's anything else to address.

Fedosin

Review

The code looks good overall and I don't see any reason not to merge it except the IPv6 incompatibility in pickHost. All previous review feedback has been addressed.

Issues

1. IPv6 compatibility in pickHost (medium)

pickHost formats the host as:

return fmt.Sprintf("%s:%d", ep.ip, ep.port)

For IPv6 pod addresses (e.g. fd00::1), this would produce fd00::1:8080 instead of the correct [fd00::1]:8080. Please use net.JoinHostPort(ep.ip, strconv.Itoa(int(ep.port))) instead, which correctly handles both IPv4 and IPv6. Kubernetes dual-stack clusters assign IPv6 pod IPs, so this is a real (if uncommon) scenario.

2. Truncated comment in context.go (nit)

// ContextWithUpstreamServerName stores the intended TLS server name (SNI) for the upstream.
// This must be set before any middleware rewrites the upstream URL (e.g. direct-to-pod),

The second line ends with a comma, suggesting a truncated sentence. Should be completed or end with a period.

3. Broadcast channel comment accuracy (nit)

The comment on the notifyCh field says "closed on any change", but swapState now only broadcasts on readiness transitions (oldState.hasReady() != newState.hasReady()). The comment should reflect this narrower contract.

linkvt

The initially trivial idea of "just connecting to endpoints instead of cluster ips" really showed to be way more tricky...
I tried to understand all changes but didn't finish the whole PR yet as some findings will again change things quite a bit IMO. I understand this is quite complex, let me know if I should give it a try as well based on these changes to show what I'm thinking about if my comments are a bit cryptic.

linkvt · 2026-05-13T07:57:43Z

+	switch s.DirectPodRouting {
+	case DirectPodRoutingDisabled, DirectPodRoutingColdStartOnly:
+		// valid
+	default:
+		panic(fmt.Sprintf("invalid KEDA_HTTP_DIRECT_POD_ROUTING value %q: must be %q or %q",
+			s.DirectPodRouting, DirectPodRoutingDisabled, DirectPodRoutingColdStartOnly))
+	}


This could be implemented directly close to the struct by implementing the TextMarshaller interface so that MustParseServing stays clean:

func (m *DirectPodRoutingMode) UnmarshalText(text []byte) error { switch DirectPodRoutingMode(text) { case DirectPodRoutingDisabled, DirectPodRoutingColdStartOnly: *m = DirectPodRoutingMode(text) return nil default: return fmt.Errorf("invalid value %q: must be %q or %q", text, DirectPodRoutingDisabled, DirectPodRoutingColdStartOnly) } }

See https://github.com/caarlos0/env/blob/a72d89a8930fc800372a6a338a1acf33e5cc3a56/example_test.go#L212-L218

linkvt · 2026-05-13T08:15:56Z

 	LogRequests bool `env:"KEDA_HTTP_LOG_REQUESTS" envDefault:"false"`
+	// DirectPodRouting controls when the interceptor routes directly to a pod IP
+	// instead of the ClusterIP service. Valid values: "disabled", "cold-start-only".
+	DirectPodRouting DirectPodRoutingMode `env:"KEDA_HTTP_DIRECT_POD_ROUTING" envDefault:"disabled"`


Couldnt this be a simple boolean? Internally we later only use a boolean as well.
Is there a reason for why we would use direct pod routing only on cold starts according to this config (didnt check the full PR yet)?

@Fedosin Suggested in the review that we should give option for different modes comment link section 2. Scope: cold-start-only vs. all requests

linkvt · 2026-05-13T08:24:51Z

+	// Note: direct-pod routing (when enabled on the interceptor) requires portName;
+	// routes using a numeric port will always be forwarded via the Service ClusterIP.


This seems like a major restriction, I don't see a reason for it yet to be honest:

I see these cases:

single port service, unnamed port

endpointslice contains only one port with name: ""

IR uses port number: resolve port name from Service, match by name in EndpointSlice

IR uses port name: not possible (service port is unnamed, nothing to reference)

single port service, named port

endpointslice contains only one port where name is equal to the name of the port in the service

IR uses port number: resolve port name from Service, match by name in EndpointSlice

IR uses port name: exact match by name

multi port service, unnamed ports

not possible, Kubernetes requires names when multiple ports are defined: https://kubernetes.io/docs/concepts/services-networking/service/#multi-port-services

multi port service, named ports

endpointslice contains multiple ports, each named after the corresponding service port

IR uses port number: resolve port name from Service, match by name in EndpointSlice

IR uses port name: exact match by name

In all cases, we can resolve to the correct EndpointSlice port without requiring portName in the IR.
The "IR uses port number" cases all work the same way: resolve the numeric port to its port name via the Service object.

The routing middleware (resolveUpstreamURL) already does that Service lookup to build the upstream URL, so it could pass the resolved port name down via context at the same time.

In the multi port service, named ports case, there is a possibility that the user specifies port names on the Service but does not provide portName at the time of configuring the InterceptorRoute. The note was targeted at this case to avoid misconfigurations.

Based on your suggestion, I feel this case can also be resolved by resolving the portName in the Routing middleware and passing it via context, by adding a separate resolvePortName method that does the reverse lookup (numeric port → port name via Service.Spec.Ports). In the EndpointResolver middleware, we can then refer to the resolved portName from context instead of ir.Spec.Target.PortName directly.

linkvt · 2026-05-13T10:25:12Z

 //
-// NOTE: Transports are never evicted, we expect a low cardinality of timeouts
+// NOTE: Transports are never evicted; we expect a low cardinality of (timeout, serverName) combinations.
 type TransportPool struct {


I think all of these transport pool changes could be avoided when we would be using transport.TLSDialContext and resolve the ServerName while dialing.
This would be a smaller change than what we're doing right and IMO cleaner, but a bit more complicated.
The base transport is currently created in proxy.go outside of this whole middleware chain, not sure if it should still be placed there after this change.

Agreed, we discussed this in a previous comment and based on your recommendation i opted this approach.

IMO we should have a seperate PR/Issue addressing the ability to configure TLSDialContext, DialContext and moving base transport outside the whole middleware chain.

linkvt · 2026-05-13T10:40:30Z

-	// "namespace/service" -> *atomic.Bool
-	ready sync.Map
+	// "namespace/service" -> *atomic.Pointer[serviceState]
+	states sync.Map


I read a bit more into this and noticed trhat sync.Map offers already Swap etc, we don't need atomic wrappers at all as it seems, see https://pkg.go.dev/sync#Map.Swap

With sync.Map.Swap, every update would call Swap on the map itself, which acquires a mutex internally even for existing keys (lock on every update).

The atomic wrapper is the entire reason we avoid that — the map value is a stable pointer that never changes, only what it points to does (lock only on first insertion).

linkvt · 2026-05-13T10:45:03Z


-	hasReady := slices.ContainsFunc(endpointSlices, hasAnyReadyEndpoint)
+	newState := collectServiceState(endpointSlices)
+


I dont understand the new code completely yet but dont understand, why this hot and cold path would be needed?
Can't we just swap and broadcast if ready changes?

The hot/cold split exists precisely to keep the hot path lock-free — as mentioned in comment, if we just called sync.Map.Swap directly, we'd acquire a mutex on every endpoint update, defeating the entire point of the atomic wrapper.

The cold path pays the mutex cost once via LoadOrStore (first insertion). After that, every update stays on the hot path: states.Load + ptr.CompareAndSwap — no map write, no mutex.

AtharvaPakade requested a review from a team as a code owner April 18, 2026 20:36

Copilot AI review requested due to automatic review settings April 18, 2026 20:36

keda-automation requested a review from a team April 18, 2026 20:36

Copilot started reviewing on behalf of AtharvaPakade April 18, 2026 20:36 View session

Copilot AI reviewed Apr 18, 2026

View reviewed changes

Comment thread pkg/k8s/ready_endpoints_cache.go Outdated

Comment thread pkg/k8s/ready_endpoints_cache.go Outdated

Comment thread interceptor/config/serving.go Outdated

Comment thread pkg/k8s/ready_endpoints_cache.go

Comment thread interceptor/middleware/endpoint_resolver.go Outdated

AtharvaPakade mentioned this pull request Apr 18, 2026

Evaluate sending requests to Endpoint IPs directly #1473

Open

AtharvaPakade force-pushed the main branch 2 times, most recently from 07a578c to fa6696a Compare April 20, 2026 07:34

linkvt reviewed Apr 23, 2026

View reviewed changes

Fedosin reviewed Apr 29, 2026

View reviewed changes

AtharvaPakade changed the title ~~feat: add direct-pod routing on cold start with per-port-name resolution~~ Adds direct-pod routing for cold starts to reduce latency when kube-proxy rule propagation is slow. May 3, 2026

AtharvaPakade changed the title ~~Adds direct-pod routing for cold starts to reduce latency when kube-proxy rule propagation is slow.~~ feat(interceptor): Adds direct-pod routing for cold starts to reduce latency when kube-proxy rule propagation is slow. May 3, 2026

AtharvaPakade force-pushed the main branch from fa6696a to b3d05ca Compare May 3, 2026 18:38

keda-automation requested a review from a team May 3, 2026 18:38

AtharvaPakade force-pushed the main branch from b3d05ca to 7605894 Compare May 4, 2026 05:46

AtharvaPakade requested review from Fedosin and linkvt May 4, 2026 05:57

Fedosin reviewed May 12, 2026

View reviewed changes

linkvt requested changes May 13, 2026

View reviewed changes

		// Note: direct-pod routing (when enabled on the interceptor) requires portName;
		// routes using a numeric port will always be forwarded via the Service ClusterIP.


		hasReady := slices.ContainsFunc(endpointSlices, hasAnyReadyEndpoint)
		newState := collectServiceState(endpointSlices)

Conversation

AtharvaPakade commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key changes:

Checklist

Uh oh!

snyk-io Bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Snyk checks have passed. No issues have been found so far.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Fedosin left a comment

Choose a reason for hiding this comment

Review

Design-Level Feedback

Code-Level Issues

Minor Nits

What's Good

Summary

Uh oh!

AtharvaPakade commented May 4, 2026

Uh oh!

Fedosin left a comment

Choose a reason for hiding this comment

Review

Issues

Uh oh!

linkvt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

AtharvaPakade commented Apr 18, 2026 •

edited

Loading

snyk-io Bot commented Apr 18, 2026 •

edited

Loading