Skip to content

fix: handle HTTPScaledObject not found gracefully in scaler#1488

Open
Fedosin wants to merge 1 commit into
kedacore:mainfrom
Fedosin:is-not-found
Open

fix: handle HTTPScaledObject not found gracefully in scaler#1488
Fedosin wants to merge 1 commit into
kedacore:mainfrom
Fedosin:is-not-found

Conversation

@Fedosin
Copy link
Copy Markdown
Contributor

@Fedosin Fedosin commented Feb 23, 2026

This PR fixes the race condition by handling the IsNotFound case in GetMetrics gracefully: instead of returning an error, it returns metric value 0 (not active) and logs at debug level. Once the cache syncs (typically within a second), subsequent calls return real values. The gRPC stream stays alive, avoiding the reconnection storm.

Checklist

Fixes #1487

Signed-off-by: Mikhail Fedosin <mfedosin@redhat.com>
@Fedosin Fedosin requested a review from a team as a code owner February 23, 2026 19:48
@snyk-io
Copy link
Copy Markdown

snyk-io Bot commented Feb 23, 2026

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@keda-automation keda-automation requested a review from a team February 23, 2026 19:48
Comment thread scaler/handlers.go
metricName := MetricName(namespacedName)

httpso := &httpv1alpha1.HTTPScaledObject{}
if err := e.reader.Get(ctx, types.NamespacedName{Namespace: sor.Namespace, Name: httpScaledObjectName}, httpso); err != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpicks:

  • we have types.NamespacedName{Namespace: sor.Namespace, Name: httpScaledObjectName} here and above the k8s.NamespacedNameFromNameAndNamespace(...) which also returns a NamespacedName
  • not yet available in cache is maybe a bit specific?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a race condition in the external scaler where controller-runtime’s cache can temporarily return NotFound for a newly created HTTPScaledObject, causing GetMetrics errors that terminate the StreamIsActive gRPC stream and trigger reconnection/error storms.

Changes:

  • Handle apierrors.IsNotFound from e.reader.Get() in GetMetrics by returning a metric value of 0 (inactive) and logging at a verbose/debug level.
  • Refactor metric name generation to happen before the Get() call so it can be reused in the NotFound early-return path.
  • Add an Unreleased changelog entry documenting the fix.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
scaler/handlers.go Adds IsNotFound handling in GetMetrics to keep the gRPC stream alive during cache sync lag.
CHANGELOG.md Documents the scaler fix in the Unreleased “Fixes” section.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scaler/handlers.go
},
}, nil
}
lggr.Error(err, "unable to get HTTPScaledObject", "name", httpScaledObjectName, "namespace", sor.Namespace, "httpScaledObjectName", httpScaledObjectName)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error log fields here are inconsistent and redundant: the key "name" is set to httpScaledObjectName, and "httpScaledObjectName" is also provided with the same value. Elsewhere in this file (e.g., GetMetricSpec) "name" refers to the ScaledObject name (sor.Name). Consider using a distinct key for the ScaledObject name (or set "name" to sor.Name) and keep only one field for the HTTPScaledObject name to avoid confusing logs.

Suggested change
lggr.Error(err, "unable to get HTTPScaledObject", "name", httpScaledObjectName, "namespace", sor.Namespace, "httpScaledObjectName", httpScaledObjectName)
lggr.Error(err, "unable to get HTTPScaledObject", "name", sor.Name, "namespace", sor.Namespace, "httpScaledObjectName", httpScaledObjectName)

Copilot uses AI. Check for mistakes.
Comment thread scaler/handlers.go
Comment on lines 209 to +220
if err := e.reader.Get(ctx, types.NamespacedName{Namespace: sor.Namespace, Name: httpScaledObjectName}, httpso); err != nil {
if apierrors.IsNotFound(err) {
lggr.V(1).Info("HTTPScaledObject not yet available in cache, returning 0", "httpScaledObjectName", httpScaledObjectName, "namespace", sor.Namespace)
return &externalscaler.GetMetricsResponse{
MetricValues: []*externalscaler.MetricValue{
{
MetricName: metricName,
MetricValue: 0,
},
},
}, nil
}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new NotFound handling path (returning a 0 metric value with nil error) changes behavior in a way that's easy to regress. Please add a unit test that exercises e.reader.Get returning a NotFound (e.g., fake client with no HTTPScaledObject) and asserts GetMetrics returns MetricValue=0 with no error (and ideally that StreamIsActive/IsActive remains healthy).

Copilot uses AI. Check for mistakes.
Comment thread scaler/handlers.go
Comment on lines +204 to +206
// generated the metric name for HTTPScaledObject
namespacedName := k8s.NamespacedNameFromNameAndNamespace(httpScaledObjectName, sor.Namespace)
metricName := MetricName(namespacedName)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the comment reads "generated the metric name..." but this code is generating it now; consider changing to "generate the metric name..." for grammatical correctness (same phrasing appears in a few places in this file).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

@wozniakjan wozniakjan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR fixes the race condition by handling the IsNotFound case in GetMetrics gracefully: instead of returning an error, it returns metric value 0 (not active) and logs at debug level. Once the cache syncs (typically within a second), subsequent calls return real values. The gRPC stream stays alive, avoiding the reconnection storm.

won't this risk scaling the app prematurely and kind of randomly to zero?

scenario:

  • an autoscaled app configured with minReplicas = 0
  • heavy traffic going through interceptor
  • scaler restarted, caches not yet synced
  • KEDA asks for metric, scaler can't find HSO => returns 0 => KEDA happily scales to zero

Copy link
Copy Markdown
Member

@linkvt linkvt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it another look after Jans review request here, only found the other location where the IsNotFound check might possibly missing.

@wozniakjan from what I understand the scaler will not respond until the caches are synced:

http-add-on/scaler/main.go

Lines 125 to 137 in 4dd989f

// Wait for cache to sync before starting the GRPC server
if !ctrlCache.WaitForCacheSync(ctx) {
setupLog.Error(nil, "cache failed to sync")
os.Exit(1)
}
eg.Go(func() error {
setupLog.Info("starting the grpc server")
if err := startGrpcServer(ctx, cfg, ctrl.Log, pinger, ctrlCache); !util.IsIgnoredErr(err) {
setupLog.Error(err, "grpc server failed")
return err
}

I guess the fallback scenario would only make sense then if the scaler can't access the HTTPSO/IR for some reason while the interceptor is able and continues to serve traffic or am I missing something?

Comment thread scaler/handlers.go
}

httpso := &httpv1alpha1.HTTPScaledObject{}
if err := e.reader.Get(ctx, types.NamespacedName{Namespace: sor.Namespace, Name: httpScaledObjectName}, httpso); err != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm currently not sure if the fix below was enough for the errors to disappear as GetMetricSpec is IIRC called before GetMetrics and would probably have errors as well?
Or should we add the same check there as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IsNotFound errors are not handled gracefully in the scaler

4 participants