Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,8 @@ kubectl create -f kubernetes/base/configmap.yaml
kubectl create -f kubernetes/deployment/deployment.yaml
```

> When running as a Deployment you can optionally enable the [Trigger API](#trigger-api) to manually trigger descheduling cycles on demand via an HTTP endpoint.

### Install Using Helm

Starting with release v0.18.0 there is an official helm chart that can be used to install the
Expand Down Expand Up @@ -1148,6 +1150,108 @@ To get best results from HA mode some additional configurations might require:
The metrics are served through https://localhost:10258/metrics by default.
The address and port can be changed by setting `--binding-address` and `--secure-port` flags.

## Trigger API

The descheduler exposes an optional HTTP endpoint that allows any authorized caller to **manually trigger a descheduling cycle on demand** without waiting for the next scheduled interval or restarting the process.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What sort of authorization is done ? i.e. how does this code distinguishes between authorized and unauthorized callers ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meaning here is ambiguous; I meant that only those with access to the cluster and port can run it. But as @googs1025 asked, I'll add authorization. :)


This is useful when:
- A node drain or topology imbalance just occurred and eviction must run immediately, not in N minutes when the timer fires.
- A CI/CD pipeline needs to rebalance pods after a rollout and must know the cycle has completed before proceeding.
- An on-call engineer needs an emergency eviction pass without touching the descheduler deployment.
- A cluster runs with a long `--descheduling-interval` to minimize churn, but requires an escape hatch for urgent situations.

### Enabling the Trigger API

Pass the `--enable-trigger-api` flag when starting the descheduler:

```bash
descheduler \
--policy-config-file=/etc/descheduler/policy.yaml \
--descheduling-interval=30m \
--enable-trigger-api
```

When using Helm, add the flag via `deschedulerCommandArguments`:

```yaml
deschedulerCommandArguments:
- "--enable-trigger-api"
```

The endpoint is registered on the existing HTTPS server (default port `10258`) — no additional ports, listeners, or TLS configuration is required.

### Endpoint Reference

| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/api/v1/descheduler/run` | Synchronously trigger a full descheduling cycle |

The request **blocks until the cycle completes** and returns a JSON response with the result. Only one trigger can be queued at a time; concurrent requests are rejected with `429`.

#### Response codes

| Code | Meaning |
|------|---------|
| `200 OK` | Cycle completed successfully |
| `405 Method Not Allowed` | Non-POST request |
| `429 Too Many Requests` | A trigger is already queued or a cycle is in progress |
| `500 Internal Server Error` | Cycle failed; error detail in `message` field |
| `504 Gateway Timeout` | HTTP client disconnected before the cycle finished |

**Example success response:**
```json
{"message": "descheduling cycle completed successfully", "status": "ok"}
```

**Example rejection response (cycle already running):**
```json
{"message": "descheduling cycle already in progress or pending", "status": "error"}
```

### Usage Examples

**Trigger from inside the cluster (exec into the descheduler pod):**

```bash
kubectl exec -n kube-system deploy/descheduler -- \
curl -sk -X POST https://localhost:10258/api/v1/descheduler/run
```

**Trigger via port-forward from a local machine:**

```bash
kubectl port-forward -n kube-system deploy/descheduler 10258:10258 &
curl -sk -X POST https://localhost:10258/api/v1/descheduler/run
```

**Trigger with a timeout (useful in scripts):**

```bash
curl -sk --max-time 300 -X POST \
https://localhost:10258/api/v1/descheduler/run | jq .
```

**Use in a CI/CD step and assert success:**

```bash
curl -sk --max-time 300 -f -X POST \
https://localhost:10258/api/v1/descheduler/run \
| jq -e '.status == "ok"'
```

### On-Demand-Only Mode (no automatic interval)

Setting `--descheduling-interval=0` together with `--enable-trigger-api` runs one cycle at startup and then keeps the process alive, responding exclusively to manual trigger requests. This is useful for environments where eviction should only happen on explicit operator request:

```bash
descheduler \
--policy-config-file=/etc/descheduler/policy.yaml \
--descheduling-interval=0 \
--enable-trigger-api
```

> **Note:** The trigger API is disabled by default. Existing deployments without `--enable-trigger-api` are completely unaffected.

## Compatibility Matrix
The below compatibility matrix shows the k8s client package(client-go, apimachinery, etc) versions that descheduler
is compiled with. At this time descheduler does not have a hard dependency to a specific k8s release. However a
Expand Down
5 changes: 5 additions & 0 deletions cmd/descheduler/app/options/options.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,10 @@ type DeschedulerServer struct {
SecureServingInfo *apiserver.SecureServingInfo
DisableMetrics bool
EnableHTTP2 bool
EnableTriggerAPI bool
// TriggerCh is used to manually trigger a descheduling cycle via the HTTP API.
// Each request sends a chan error; the loop sends the cycle result back on it.
TriggerCh chan chan error
// FeatureGates enabled by the user
FeatureGates map[string]bool
// DefaultFeatureGates for internal accessing so unit tests can enable/disable specific features
Expand Down Expand Up @@ -121,6 +125,7 @@ func (rs *DeschedulerServer) AddFlags(fs *pflag.FlagSet) {
fs.Float64Var(&rs.Tracing.SampleRate, "otel-sample-rate", 1.0, "Sample rate to collect the Traces")
fs.BoolVar(&rs.Tracing.FallbackToNoOpProviderOnError, "otel-fallback-no-op-on-error", false, "Fallback to NoOp Tracer in case of error")
fs.BoolVar(&rs.EnableHTTP2, "enable-http2", false, "If http/2 should be enabled for the metrics and health check")
fs.BoolVar(&rs.EnableTriggerAPI, "enable-trigger-api", rs.EnableTriggerAPI, "Enable the /api/v1/descheduler/run endpoint for manually triggering descheduling cycles via HTTP POST.")
fs.Var(cliflag.NewMapStringBool(&rs.FeatureGates), "feature-gates", "A set of key=value pairs that describe feature gates for alpha/experimental features. "+
"Options are:\n"+strings.Join(features.DefaultMutableFeatureGate.KnownFeatures(), "\n"))

Expand Down
58 changes: 58 additions & 0 deletions cmd/descheduler/app/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,9 @@ package app

import (
"context"
"encoding/json"
"io"
"net/http"
"os/signal"
"syscall"
"time"
Expand Down Expand Up @@ -98,6 +100,12 @@ func Run(rootCtx context.Context, rs *options.DeschedulerServer) error {

healthz.InstallHandler(pathRecorderMux, healthz.NamedCheck("Descheduler", healthz.PingHealthz.Check))

if rs.EnableTriggerAPI {
rs.TriggerCh = make(chan chan error, 1)
pathRecorderMux.Handle("/api/v1/descheduler/run", newTriggerHandler(rs.TriggerCh))
klog.V(1).Info("Trigger API enabled at /api/v1/descheduler/run")
}

var stoppedCh <-chan struct{}
var err error
if rs.SecureServingInfo != nil {
Expand Down Expand Up @@ -137,3 +145,53 @@ func Run(rootCtx context.Context, rs *options.DeschedulerServer) error {

return nil
}

func newTriggerHandler(triggerCh chan chan error) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodPost {
writeJSON(w, http.StatusMethodNotAllowed, map[string]string{
"status": "error",
"message": "method not allowed, use POST",
})
return
}

resultCh := make(chan error, 1)
select {
case triggerCh <- resultCh:
klog.V(2).Info("Descheduling cycle triggered via API")
default:
writeJSON(w, http.StatusTooManyRequests, map[string]string{
"status": "error",
"message": "descheduling cycle already in progress or pending",
})
return
}

select {
case err := <-resultCh:
if err != nil {
writeJSON(w, http.StatusInternalServerError, map[string]string{
"status": "error",
"message": err.Error(),
})
} else {
writeJSON(w, http.StatusOK, map[string]string{
"status": "ok",
"message": "descheduling cycle completed successfully",
})
}
case <-r.Context().Done():
writeJSON(w, http.StatusGatewayTimeout, map[string]string{
"status": "error",
"message": "request cancelled or timed out",
})
}
})
}

func writeJSON(w http.ResponseWriter, statusCode int, data interface{}) {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(statusCode)
json.NewEncoder(w).Encode(data) //nolint:errcheck
}
43 changes: 32 additions & 11 deletions pkg/descheduler/descheduler.go
Original file line number Diff line number Diff line change
Expand Up @@ -532,23 +532,44 @@ func RunDeschedulerStrategies(ctx context.Context, rs *options.DeschedulerServer
return err
}

wait.NonSlidingUntil(func() {
// A next context is created here intentionally to avoid nesting the spans via context.
sCtx, sSpan := tracing.Tracer().Start(ctx, "NonSlidingUntil")
executeCycle := func() error {
sCtx, sSpan := tracing.Tracer().Start(ctx, "DeschedulingCycle")
defer sSpan.End()

if err := runLoop(sCtx); err != nil {
sSpan.AddEvent("Descheduling loop failed", trace.WithAttributes(attribute.String("err", err.Error())))
klog.Error(err)
return
}
// If there was no interval specified, send a signal to the stopChannel to end the wait.Until loop after 1 iteration
if rs.DeschedulingInterval.Seconds() == 0 {
cancel()
return err
}
}, rs.DeschedulingInterval, ctx.Done())
return nil
}

return nil
executeCycle() //nolint:errcheck

if rs.DeschedulingInterval.Seconds() == 0 && rs.TriggerCh == nil {
return nil
}

var tickerC <-chan time.Time
if rs.DeschedulingInterval.Seconds() > 0 {
ticker := time.NewTicker(rs.DeschedulingInterval)
defer ticker.Stop()
tickerC = ticker.C
}

for {
select {
case <-ctx.Done():
return nil
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case <-ctx.Done():                                                                                                                                                                    
      // Drain any pending trigger so the caller doesn't hang.                                                                                                                          
      select {                                                                                                                                                                          
      case resultCh := <-rs.TriggerCh:                                                                                                                                                
          resultCh <- fmt.Errorf("descheduler shutting down")                                                                                                                           
      default:                                                                                                                                                                          
      }
      return nil  

edge case: if a request is sitting in TriggerCh (buffer 1) when shutdown fires, this returns and the handler's <-resultCh blocks until the request context is torn down — which usually shows up as a 504 to the caller during a graceful shutdown. Maybe drain it on the way out? 🤔

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, missed this case :) i will fix it!

case <-tickerC:
executeCycle() //nolint:errcheck
case resultCh := <-rs.TriggerCh:
klog.V(1).Info("Descheduling cycle triggered via API")
err := executeCycle()
if resultCh != nil {
resultCh <- err
}
}
}
}

func GetPluginConfig(pluginName string, pluginConfigs []api.PluginConfig) (*api.PluginConfig, int) {
Expand Down