Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 42 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,7 @@ Balance Plugins: These plugins process all pods, or groups of pods, and determin
| [RemovePodsViolatingNodeTaints](#removepodsviolatingnodetaints) |Deschedule|Evicts pods violating node taints|
| [RemovePodsViolatingTopologySpreadConstraint](#removepodsviolatingtopologyspreadconstraint) |Balance|Evicts pods violating TopologySpreadConstraints|
| [RemovePodsHavingTooManyRestarts](#removepodshavingtoomanyrestarts) |Deschedule|Evicts pods having too many restarts|
| [PodLifeTime](#podlifetime) |Deschedule|Evicts pods that have exceeded a specified age limit|
| [PodLifeTime](#podlifetime) |Deschedule|Evicts pods based on age, status transitions, conditions, states, exit codes, and owner kinds|
| [RemoveFailedPods](#removefailedpods) |Deschedule|Evicts pods with certain failed reasons and exit codes|


Expand Down Expand Up @@ -785,30 +785,52 @@ profiles:

### PodLifeTime

This strategy evicts pods that are older than `maxPodLifeTimeSeconds`.
This strategy evicts pods based on their age, status transitions, conditions, states, exit codes, and owner kinds. It supports both simple age-based eviction and fine-grained cleanup of pods matching specific transition criteria.

You can also specify `states` parameter to **only** evict pods matching the following conditions:
> The primary purpose for using states like `Succeeded` and `Failed` is releasing resources so that new pods can be rescheduled.
> I.e., the main motivation is not for cleaning pods, rather to release resources.
- [Pod Phase](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase) status of: `Running`, `Pending`, `Succeeded`, `Failed`, `Unknown`
- [Pod Reason](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions) reasons of: `NodeAffinity`, `NodeLost`, `Shutdown`, `UnexpectedAdmissionError`
- [Container State Waiting](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-state-waiting) condition of: `PodInitializing`, `ContainerCreating`, `ImagePullBackOff`, `CrashLoopBackOff`, `CreateContainerConfigError`, `ErrImagePull`, `ImagePullBackOff`, `CreateContainerError`, `InvalidImageName`
All non-empty filter categories are ANDed (a pod must satisfy every specified filter). Within each category, items are ORed (matching any one entry satisfies that filter). For `conditions`, a pod is eligible for eviction if **any** of the listed condition filters match — each filter is evaluated independently against the pod's `status.conditions[]` entries. Pods are processed from oldest to newest based on their creation time.

If a value for `states` or `podStatusPhases` is not specified,
Pods in any state (even `Running`) are considered for eviction.
See the [plugin README](pkg/framework/plugins/podlifetime/README.md) for detailed documentation and advanced use cases.
Comment thread
a7i marked this conversation as resolved.

**Parameters:**

| Name | Type | Notes |
|--------------------------------|---------------------------------------------------|--------------------------|
| `maxPodLifeTimeSeconds` | int | |
| `states` | list(string) | Only supported in v0.25+ |
| `includingInitContainers` | bool | Only supported in v0.31+ |
| `includingEphemeralContainers` | bool | Only supported in v0.31+ |
| `namespaces` | (see [namespace filtering](#namespace-filtering)) | |
| `labelSelector` | (see [label filtering](#label-filtering)) | |
| Name | Type | Notes |
|------|------|-------|
| `conditions` | list(object) | Each with optional `type`, `status`, `reason`, `minTimeSinceLastTransitionSeconds` fields |
| `exitCodes` | list(int32) | Container terminated exit codes |
| `includingEphemeralContainers` | bool | Extend state filtering to ephemeral containers |
| `includingInitContainers` | bool | Extend state/exitCode filtering to init containers |
| `labelSelector` | (see [label filtering](#label-filtering)) | |
| `maxPodLifeTimeSeconds` | uint | Pods older than this many seconds are evicted |
| `namespaces` | (see [namespace filtering](#namespace-filtering)) | |
| `ownerKinds` | object | `include` or `exclude` list of owner reference kinds |
| `states` | list(string) | Pod phases, pod status reasons, container waiting/terminated reasons |

**Example:**
**Example (transition-based eviction):**

```yaml
apiVersion: "descheduler/v1alpha2"
kind: "DeschedulerPolicy"
profiles:
- name: ProfileName
pluginConfig:
- name: "PodLifeTime"
args:
states:
- "Succeeded"
conditions:
Comment thread
ingvagabund marked this conversation as resolved.
- reason: "PodCompleted"
status: "True"
minTimeSinceLastTransitionSeconds: 14400
ownerKinds:
exclude:
- "Job"
Comment thread
a7i marked this conversation as resolved.
plugins:
deschedule:
enabled:
- "PodLifeTime"
```

**Example (age-based eviction):**

```yaml
apiVersion: "descheduler/v1alpha2"
Expand All @@ -829,6 +851,7 @@ profiles:
```

### RemoveFailedPods

This strategy evicts pods that are in failed status phase.
You can provide optional parameters to filter by failed pods' and containters' `reasons`. and `exitCodes`. `exitCodes` apply to failed pods' containers with `terminated` state only. `reasons` and `exitCodes` can be expanded to include those of InitContainers as well by setting the optional parameter `includingInitContainers` to `true`.
You can specify an optional parameter `minPodLifetimeSeconds` to evict pods that are older than specified seconds.
Expand Down
20 changes: 20 additions & 0 deletions examples/pod-life-time-transition.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
apiVersion: "descheduler/v1alpha2"
kind: "DeschedulerPolicy"
profiles:
- name: ProfileName
pluginConfig:
- name: "PodLifeTime"
args:
states:
- "Succeeded"
conditions:
- reason: "PodCompleted"
status: "True"
minTimeSinceLastTransitionSeconds: 14400 # 4 hours
namespaces:
include:
- "default"
Comment thread
a7i marked this conversation as resolved.
plugins:
deschedule:
enabled:
- "PodLifeTime"
22 changes: 22 additions & 0 deletions pkg/descheduler/pod/pods.go
Original file line number Diff line number Diff line change
Expand Up @@ -292,6 +292,28 @@ func SortPodsBasedOnAge(pods []*v1.Pod) {
})
}

// HasMatchingContainerWaitingState returns true if any container status has a
// waiting reason present in the given set.
func HasMatchingContainerWaitingState(statuses []v1.ContainerStatus, states sets.Set[string]) bool {
for _, cs := range statuses {
if cs.State.Waiting != nil && states.Has(cs.State.Waiting.Reason) {
return true
}
}
return false
}

// HasMatchingContainerTerminatedState returns true if any container status has
// a terminated reason present in the given set.
func HasMatchingContainerTerminatedState(statuses []v1.ContainerStatus, states sets.Set[string]) bool {
for _, cs := range statuses {
if cs.State.Terminated != nil && states.Has(cs.State.Terminated.Reason) {
return true
}
}
return false
}

func GroupByNodeName(pods []*v1.Pod) map[string][]*v1.Pod {
m := make(map[string][]*v1.Pod)
for i := 0; i < len(pods); i++ {
Expand Down
193 changes: 99 additions & 94 deletions pkg/framework/plugins/podlifetime/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,136 +2,116 @@

## What It Does

The PodLifeTime plugin evicts pods that have been running for too long. You can configure a maximum age threshold, and the plugin evicts pods older than that threshold. The oldest pods are evicted first.
The PodLifeTime plugin evicts pods based on their age, status phase, condition transitions, container states, exit codes, and owner kinds. It can be used for simple age-based eviction or for fine-grained cleanup of pods matching specific transition criteria.

## How It Works

The plugin examines all pods across your nodes and selects those that exceed the configured age threshold. You can further narrow down which pods are considered by specifying:
The plugin builds a filter chain from the configured criteria. All non-empty filter categories are ANDed together (a pod must satisfy every specified filter to be evicted). Within each filter category, items are ORed (matching any one entry satisfies that filter).

- Which namespaces to include or exclude
- Which labels pods must have
- Which states pods must be in (e.g., Running, Pending, CrashLoopBackOff)

Once pods are selected, they are sorted by age (oldest first) and evicted in that order. Eviction stops when limits are reached (per-node limits, total limits, or Pod Disruption Budget constraints).
Once pods are selected, they are sorted by their creation time with the oldest first, then evicted in order. Eviction stops when limits are reached (per-node limits, total limits, or Pod Disruption Budget constraints).

## Use Cases

- **Resource Leakage Mitigation**: Restart long-running pods that may have accumulated memory leaks, stale cache, or resource leaks
```yaml
args:
maxPodLifeTimeSeconds: 604800 # 7 days
states: [Running]
```

- **Ephemeral Workload Cleanup**: Remove long-running batch jobs, test pods, or temporary workloads that have exceeded their expected lifetime
- **Evict completed/succeeded pods that have been idle too long**:
```yaml
args:
maxPodLifeTimeSeconds: 7200 # 2 hours
states: [Succeeded, Failed]
states: [Succeeded]
conditions:
- reason: PodCompleted
status: "True"
minTimeSinceLastTransitionSeconds: 14400 # 4 hours
```

- **Node Hygiene**: Remove forgotten or stuck pods that are consuming resources but not making progress
- **Evict failed pods**, excluding Job-owned pods:
```yaml
args:
maxPodLifeTimeSeconds: 3600 # 1 hour
states: [CrashLoopBackOff, ImagePullBackOff, ErrImagePull]
states: [Failed]
exitCodes: [1]
ownerKinds:
exclude: [Job]
maxPodLifeTimeSeconds: 3600
includingInitContainers: true
```

- **Config/Secret Update Pickup**: Force pod restart to pick up updated ConfigMaps, Secrets, or environment variables
```yaml
args:
maxPodLifeTimeSeconds: 86400 # 1 day
states: [Running]
labelSelector:
matchLabels:
config-refresh: enabled
```

- **Security Rotation**: Periodically refresh pods to pick up new security tokens, certificates, or patched container images
```yaml
args:
maxPodLifeTimeSeconds: 259200 # 3 days
states: [Running]
namespaces:
exclude: [kube-system]
```

- **Dev/Test Environment Cleanup**: Automatically clean up old pods in development or staging namespaces
```yaml
args:
maxPodLifeTimeSeconds: 86400 # 1 day
namespaces:
include: [dev, staging, test]
```

- **Cluster Health Freshness**: Ensure pods periodically restart to maintain cluster health and verify workloads can recover from restarts
- **Resource Leakage Mitigation**: Restart long-running pods that may have accumulated memory leaks:
```yaml
args:
maxPodLifeTimeSeconds: 604800 # 7 days
states: [Running]
namespaces:
exclude: [kube-system, production]
```

- **Rebalancing Assistance**: Work alongside other descheduler strategies by removing old pods to allow better pod distribution
- **Clean up stuck pods in CrashLoopBackOff**:
```yaml
args:
maxPodLifeTimeSeconds: 1209600 # 14 days
states: [Running]
states: [CrashLoopBackOff, ImagePullBackOff]
```

- **Non-Critical Stateful Refresh**: Occasionally reset tolerable stateful workloads that can handle data loss or have external backup mechanisms
- **Evict pods owned only by specific kinds**:
```yaml
args:
maxPodLifeTimeSeconds: 2592000 # 30 days
labelSelector:
matchLabels:
stateful-tier: cache
states: [Succeeded, Failed]
ownerKinds:
include: [Job]
maxPodLifeTimeSeconds: 600
```

## Configuration

| Parameter | Description | Type | Required | Default |
|-----------|-------------|------|----------|---------|
| `maxPodLifeTimeSeconds` | Pods older than this many seconds are evicted | `uint` | Yes | - |
| `namespaces` | Limit eviction to specific namespaces (or exclude specific namespaces) | `Namespaces` | No | `nil` |
| `maxPodLifeTimeSeconds` | Pods older than this many seconds are evicted | `uint` | No* | `nil` |
| `states` | Filter pods by phase, pod status reason, or container waiting/terminated reason. A pod matches if any of its state values appear in this list | `[]string` | No | `nil` |
| `conditions` | Only evict pods with matching status conditions (see PodConditionFilter) | `[]PodConditionFilter` | No | `nil` |
| `exitCodes` | Only evict pods with matching container terminated exit codes | `[]int32` | No | `nil` |
| `ownerKinds` | Include or exclude pods by owner reference kind | `OwnerKinds` | No | `nil` |
| `namespaces` | Limit eviction to specific namespaces (include or exclude) | `Namespaces` | No | `nil` |
| `labelSelector` | Only evict pods matching these labels | `metav1.LabelSelector` | No | `nil` |
| `states` | Only evict pods in specific states (e.g., Running, CrashLoopBackOff) | `[]string` | No | `nil` |
| `includingInitContainers` | When checking states, also check init container states | `bool` | No | `false` |
| `includingEphemeralContainers` | When checking states, also check ephemeral container states | `bool` | No | `false` |

### Discovering states

Each pod is checked for the following locations to discover its relevant state:

1. **Pod Phase** - The overall pod lifecycle phase:
- `Running` - Pod is running on a node
- `Pending` - Pod has been accepted but containers are not yet running
- `Succeeded` - All containers terminated successfully
- `Failed` - All containers terminated, at least one failed
- `Unknown` - Pod state cannot be determined

2. **Pod Status Reason** - Why the pod is in its current state:
- `NodeAffinity` - Pod cannot be scheduled due to node affinity rules
- `NodeLost` - Node hosting the pod is lost
- `Shutdown` - Pod terminated due to node shutdown
- `UnexpectedAdmissionError` - Pod admission failed unexpectedly

3. **Container Waiting Reason** - Why containers are waiting to start:
- `PodInitializing` - Pod is still initializing
- `ContainerCreating` - Container is being created
- `ImagePullBackOff` - Image pull is failing and backing off
- `CrashLoopBackOff` - Container is crashing repeatedly
- `CreateContainerConfigError` - Container configuration is invalid
- `ErrImagePull` - Image cannot be pulled
- `CreateContainerError` - Container creation failed
- `InvalidImageName` - Image name is invalid

By default, only regular containers are checked. Enable `includingInitContainers` or `includingEphemeralContainers` to also check those container types.
| `includingInitContainers` | Extend state/exitCode filtering to init containers | `bool` | No | `false` |
| `includingEphemeralContainers` | Extend state filtering to ephemeral containers | `bool` | No | `false` |

*At least one filtering criterion must be specified (`maxPodLifeTimeSeconds`, `states`, `conditions`, or `exitCodes`).

### States

The `states` field matches pods using an OR across these categories:

| Category | Examples |
|----------|----------|
| Pod phase | `Running`, `Pending`, `Succeeded`, `Failed`, `Unknown` |
| Pod status reason | `NodeAffinity`, `NodeLost`, `Shutdown`, `UnexpectedAdmissionError` |
| Container waiting reason | `CrashLoopBackOff`, `ImagePullBackOff`, `ErrImagePull`, `CreateContainerConfigError`, `CreateContainerError`, `InvalidImageName`, `PodInitializing`, `ContainerCreating` |
| Container terminated reason | `OOMKilled`, `Error`, `Completed`, `DeadlineExceeded`, `Evicted`, `ContainerCannotRun`, `StartError` |

When `includingInitContainers` is true, init container states are also checked. When `includingEphemeralContainers` is true, ephemeral container states are also checked.

### PodConditionFilter

Each condition filter matches against `pod.status.conditions[]` entries. Within a single filter, all specified field-level checks must match (AND). Unset fields are not checked. Across the list, condition filters are ORed — a pod is eligible for eviction if **any** of the listed condition filters match.

| Field | Description |
|-------|-------------|
| `type` | Condition type (e.g., `Ready`, `Initialized`, `ContainersReady`) |
| `status` | Condition status (`True`, `False`, `Unknown`) |
| `reason` | Condition reason (e.g., `PodCompleted`) |
| `minTimeSinceLastTransitionSeconds` | Require the matching condition's `lastTransitionTime` to be at least this many seconds in the past |

At least one of these fields must be set per filter entry.

When `minTimeSinceLastTransitionSeconds` is set on a filter, a pod's condition must both match the type/status/reason fields AND have transitioned long enough ago. If the condition has no `lastTransitionTime`, it does not match.

### OwnerKinds

| Field | Description |
|-------|-------------|
| `include` | Only evict pods owned by these kinds |
| `exclude` | Do not evict pods owned by these kinds |

At most one of `include`/`exclude` may be set.

## Example

### Age-based eviction with state filter

```yaml
apiVersion: descheduler/v1alpha2
kind: DeschedulerPolicy
Expand All @@ -145,11 +125,36 @@ profiles:
- name: PodLifeTime
args:
maxPodLifeTimeSeconds: 86400 # 1 day
states:
- Running
namespaces:
include:
- default
```

### Transition-based eviction for completed pods

```yaml
apiVersion: descheduler/v1alpha2
kind: DeschedulerPolicy
profiles:
- name: default
plugins:
deschedule:
enabled:
- name: PodLifeTime
pluginConfig:
- name: PodLifeTime
args:
states:
- Running
- Succeeded
conditions:
- reason: PodCompleted
status: "True"
minTimeSinceLastTransitionSeconds: 14400
namespaces:
include:
- default
```

This configuration evicts Running pods in the `default` namespace that are older than 1 day.
This configuration evicts Succeeded pods in the `default` namespace that have a `PodCompleted` condition with status `True` and whose last matching transition happened more than 4 hours ago.
Loading
Loading