Skip to content

feat: add HTTP trigger API for on-demand descheduling cycles#1857

Open
ergoz wants to merge 2 commits into
kubernetes-sigs:masterfrom
ergoz:feat/api-handler
Open

feat: add HTTP trigger API for on-demand descheduling cycles#1857
ergoz wants to merge 2 commits into
kubernetes-sigs:masterfrom
ergoz:feat/api-handler

Conversation

@ergoz
Copy link
Copy Markdown

@ergoz ergoz commented Apr 10, 2026

Description

Problem

The descheduler today supports two operational modes if runs in deployment mode:

  1. One-shot (--descheduling-interval=0): runs a single eviction pass at startup and exits.
  2. Continuous (--descheduling-interval=5m): runs passes on a fixed time interval.

Neither mode gives operators any way to manually trigger an eviction cycle without restarting the process or waiting for the next scheduled tick. This creates friction in several real-world scenarios:

  • Post-incident remediation: a node drain or topology imbalance just occurred and you need eviction to run right now, not in 4 minutes when the timer fires.
  • CI/CD pipelines: a deployment job wants to explicitly rebalance pods after a rollout and needs to know when descheduling is complete before proceeding.
  • Debugging & dry-run validation: an SRE wants to trigger a dry-run pass interactively, inspect logs, and iterate — without restarting the descheduler pod.
  • Long interval + emergency: the cluster runs descheduler with a 30-minute interval to reduce churn, but an on-call engineer needs to act immediately.

There was no escape hatch. The only workaround was to delete the descheduler pod and let it restart, which is disruptive and causes a gap in leader election.


Solution

This PR introduces an optional HTTP trigger API that allows any authorized caller to synchronously initiate a descheduling cycle at any time via a single curl command.

The API is disabled by default and must be explicitly opted into with --enable-trigger-api, keeping backward compatibility complete.

Closes #1855


Changes

cmd/descheduler/app/options/options.go

  • Added EnableTriggerAPI bool field to DeschedulerServer to control the feature.
  • Added TriggerCh chan chan error — the communication channel between the HTTP handler and the main scheduling loop. Using chan chan error allows the handler to receive the cycle result and propagate it synchronously to the HTTP caller.
  • Registered the --enable-trigger-api CLI flag.

cmd/descheduler/app/server.go

  • When EnableTriggerAPI is set, a buffered chan chan error is created and POST /api/v1/descheduler/run is registered on the existing PathRecorderMux (alongside /metrics and /healthz), so it is served by the same HTTPS server — no new port, no new listener, no new TLS config.
  • The handler is synchronous: it blocks until the descheduling cycle finishes and returns the result as JSON. The caller gets a definitive success/failure signal.
  • Concurrency protection: the trigger channel has a buffer of 1. If a cycle is already queued or running, a subsequent request receives 429 Too Many Requests immediately instead of blocking indefinitely or spawning a second concurrent cycle.
  • Request cancellation is respected: if the HTTP client disconnects, the handler unblocks via r.Context().Done() and returns 504 Gateway Timeout.

pkg/descheduler/descheduler.go

  • Replaced wait.NonSlidingUntil with an explicit select-based event loop. The loop listens on three channels simultaneously:
    • ticker.C — fires at --descheduling-interval (existing behavior, unmodified)
    • rs.TriggerCh — receives a manual trigger; executes the cycle and writes the result back to the caller's response channel
    • ctx.Done() — graceful shutdown
  • When TriggerCh is nil (feature disabled), it is never selected in select, so the scheduler behaves exactly as before.
  • When --descheduling-interval=0 and --enable-trigger-api is set, the process now stays alive after the initial cycle and responds to trigger calls indefinitely — enabling a daemon-without-tick mode useful for purely on-demand operation.

API Reference

Method Path Description
POST /api/v1/descheduler/run Trigger a descheduling cycle synchronously

Success (200)

{"message": "descheduling cycle completed successfully", "status": "ok"}

Cycle already running (429)

{"message": "descheduling cycle already in progress or pending", "status": "error"}

Internal error (500)

{"message": "failed to run descheduler loop: ...", "status": "error"}

Client disconnected (504)

{"message": "request cancelled or timed out", "status": "error"}

Usage Examples

Start descheduler with the trigger API enabled

descheduler \
  --policy-config-file=/etc/descheduler/policy.yaml \
  --descheduling-interval=30m \
  --enable-trigger-api

Trigger a cycle immediately from inside the cluster

kubectl exec -n kube-system deploy/descheduler -- \
  curl -sk -X POST https://localhost:10258/api/v1/descheduler/run

Trigger via port-forward from a local machine

kubectl port-forward -n kube-system deploy/descheduler 10258:10258 &
curl -sk -X POST https://localhost:10258/api/v1/descheduler/run

Use in a CI/CD step after a deployment

curl -sk --max-time 300 -X POST \
  https://descheduler.kube-system.svc.cluster.local:10258/api/v1/descheduler/run \
  | jq -e '.status == "ok"'

Daemon-without-tick mode (purely on-demand, no automatic interval)

descheduler \
  --policy-config-file=/etc/descheduler/policy.yaml \
  --descheduling-interval=0 \
  --enable-trigger-api
# The initial cycle runs at startup.
# The process stays alive and responds only to explicit POST /api/v1/descheduler/run calls.

Backward Compatibility

  • --enable-trigger-api defaults to false. Existing deployments are unaffected.
  • The main scheduling loop (select-based replacement of wait.NonSlidingUntil) is functionally equivalent for all existing flag combinations when the trigger API is disabled.
  • No new ports, certificates, or RBAC resources are introduced.

Checklist

Please ensure your pull request meets the following criteria before submitting
for review, these items will be used by reviewers to assess the quality and
completeness of your changes:

  • Code Readability: Is the code easy to understand, well-structured, and consistent with project conventions?
  • Naming Conventions: Are variable, function, and structs descriptive and consistent?
  • Code Duplication: Is there any repeated code that should be refactored?
  • Function/Method Size: Are functions/methods short and focused on a single task?
  • Comments & Documentation: Are comments clear, useful, and not excessive? Were comments updated where necessary?
  • Error Handling: Are errors handled appropriately ?
  • Testing: Are there sufficient unit/integration tests?
  • Performance: Are there any obvious performance issues or unnecessary computations?
  • Dependencies: Are new dependencies justified ?
  • Logging & Monitoring: Is logging used appropriately (not too verbose, not too silent)?
  • Backward Compatibility: Does this change break any existing functionality or APIs?
  • Resource Management: Are resources (files, connections, memory) managed and released properly?
  • PR Description: Is the PR description clear, providing enough context and explaining the motivation for the change?
  • Documentation & Changelog: Are README and docs updated if necessary?

ergoz added 2 commits April 9, 2026 01:27
Signed-off-by: Sergey Morozov <sergey.morozov@flant.com>
Signed-off-by: Sergey Morozov <sergey.morozov@flant.com>
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Apr 10, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Apr 10, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Welcome @ergoz!

It looks like this is your first PR to kubernetes-sigs/descheduler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/descheduler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 10, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @ergoz. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot requested a review from googs1025 April 10, 2026 11:45
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign ingvagabund for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from JaneLiuL April 10, 2026 11:45
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 10, 2026
@ergoz
Copy link
Copy Markdown
Author

ergoz commented Apr 21, 2026

@googs1025 @JaneLiuL Hello! I hope you're all doing well. I wanted to gently follow up on the pull request I submitted a little while ago — I'd really appreciate it if someone could take a look when they have a free moment.

No rush at all, just wanted to make sure it didn't get lost in the shuffle! Happy to answer any questions or make changes based on your feedback.

Thanks so much in advance! 🙏

@ergoz
Copy link
Copy Markdown
Author

ergoz commented May 4, 2026

@googs1025 @JaneLiuL Hello! Kindly remind :) we really need this feature :)

@googs1025
Copy link
Copy Markdown
Member

will check this pr tomorrow

@ergoz
Copy link
Copy Markdown
Author

ergoz commented May 4, 2026

@googs1025 great news! Thank you very much!

@googs1025
Copy link
Copy Markdown
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 5, 2026
for {
select {
case <-ctx.Done():
return nil
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case <-ctx.Done():                                                                                                                                                                    
      // Drain any pending trigger so the caller doesn't hang.                                                                                                                          
      select {                                                                                                                                                                          
      case resultCh := <-rs.TriggerCh:                                                                                                                                                
          resultCh <- fmt.Errorf("descheduler shutting down")                                                                                                                           
      default:                                                                                                                                                                          
      }
      return nil  

edge case: if a request is sitting in TriggerCh (buffer 1) when shutdown fires, this returns and the handler's <-resultCh blocks until the request context is torn down — which usually shows up as a 504 to the caller during a graceful shutdown. Maybe drain it on the way out? 🤔

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, missed this case :) i will fix it!

@googs1025
Copy link
Copy Markdown
Member

Thanks for the detailed write-up

A few things I'd want to see before this lands, though:

  1. The endpoint mutates cluster state but doesn't go through any auth filter — it's mounted on the same mux as /metrics and /healthz, which are read-only by convention. Anyone with network reachability to :10258 can trigger an eviction pass. I think we need either an auth filter (SAR/bearer token, like other K8s components do) or at minimum a loopback-only bind so this is kubectl exec-only.
  2. No tests in the diff — the main loop rewrite (wait.NonSlidingUntil → custom select) really needs coverage for the four flag combinations (interval 0/+, trigger on/off).

Comment thread README.md

## Trigger API

The descheduler exposes an optional HTTP endpoint that allows any authorized caller to **manually trigger a descheduling cycle on demand** without waiting for the next scheduled interval or restarting the process.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What sort of authorization is done ? i.e. how does this code distinguishes between authorized and unauthorized callers ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meaning here is ambiguous; I meant that only those with access to the cluster and port can run it. But as @googs1025 asked, I'll add authorization. :)

@ergoz
Copy link
Copy Markdown
Author

ergoz commented May 6, 2026

Thanks for the detailed write-up

A few things I'd want to see before this lands, though:

  1. The endpoint mutates cluster state but doesn't go through any auth filter — it's mounted on the same mux as /metrics and /healthz, which are read-only by convention. Anyone with network reachability to :10258 can trigger an eviction pass. I think we need either an auth filter (SAR/bearer token, like other K8s components do) or at minimum a loopback-only bind so this is kubectl exec-only.
  2. No tests in the diff — the main loop rewrite (wait.NonSlidingUntil → custom select) really needs coverage for the four flag combinations (interval 0/+, trigger on/off).
  1. Yes, you are right! I will add auth like in k8s. But should i add another port listener / mux / something else to split it with /healthz and /metrics? or auth is enough?
  2. Ok, will be added :)

@ingvagabund
Copy link
Copy Markdown
Contributor

In the past we discussed the option of running the descheduler async based on e.g. node condition changes. E.g. everytime a node goes unready for some time or nodes scale down/up. Observing cluster objects via informers and custom indexers.

real-world scenarios

Can you please more elaborate on how this would work from the user perspective? A single endpoint for every user with a single role? Or, is there a possibility of creating multiple roles? E.g assigning users permissions for individual actions?

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@ergoz: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-descheduler-verify-master 65cfeeb link true /test pull-descheduler-verify-master
pull-descheduler-test-e2e-k8s-master-1-36 65cfeeb link true /test pull-descheduler-test-e2e-k8s-master-1-36

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposal: Add http handler for manual descheduling process launch

5 participants