Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
blank_issues_enabled: false
contact_links:
- name: Installation & Troubleshooting Wiki
url: https://github.com/NVIDIA/k8s-dra-driver-gpu/wiki
about: Check the wiki for installation guides and common troubleshooting steps before opening an issue.
- name: DRA Driver for NVIDIA GPUs Documentation
url: https://github.com/kubernetes-sigs/dra-driver-nvidia-gpu/tree/main/docs
about: Check the our docs for installation guides and common troubleshooting steps before opening an issue.
- name: Kubernetes DRA Documentation
url: https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/
about: Upstream Kubernetes Dynamic Resource Allocation documentation.
Expand Down
6 changes: 3 additions & 3 deletions .github/ISSUE_TEMPLATE/question.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ body:
attributes:
value: |
Before asking a question, please check:
- [Project Wiki](https://github.com/NVIDIA/k8s-dra-driver-gpu/wiki) for installation and troubleshooting
- [Release Notes](https://github.com/NVIDIA/k8s-dra-driver-gpu/releases) for recent changes and known issues
- [Demo specs](https://github.com/NVIDIA/k8s-dra-driver-gpu/tree/main/demo/specs) for usage examples
- [Project Documentation](https://github.com/kubernetes-sigs/dra-driver-nvidia-gpu/tree/main/docs) for installation and troubleshooting
- [Release Notes](https://github.com/kubernetes-sigs/dra-driver-nvidia-gpu/releases) for recent changes and known issues
- [Demo specs](https://github.com/kubernetes-sigs/dra-driver-nvidia-gpu/tree/main/demo/specs) for usage examples
- [Kubernetes DRA documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/)

- type: textarea
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ For exploration and demonstration purposes, see the "demo" section below, and al

## Installation

Configuration and installation instructions can for now be found [in our Wiki](https://github.com/kubernetes-sigs/dra-driver-nvidia-gpu/wiki/Installation).
Configuration and installation instructions can for now be found in the [/docs folder](docs/install.md).

## A (kind) demo

Expand Down
47 changes: 1 addition & 46 deletions docs/prerequisites.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,7 @@ Cluster, software, and hardware requirements for the DRA Driver for NVIDIA GPUs.

| Requirement | Version / Notes |
|---|---|
| Kubernetes | v1.34.2 or later, with at least one node that has one or more NVIDIA GPUs. |
| `DynamicResourceAllocation` feature gate | Enabled by default in Kubernetes v1.34+. On v1.32 and v1.33, [enable it manually](#enable-dra-on-kubernetes-v132-and-v133). |
| Kubernetes | v1.34.2 or later, with at least one node that has one or more NVIDIA GPUs. The use of DRA became GA in Kubernetes v1.34+ and earlier versions required the `DynamicResourceAllocation` feature gate. |
| Helm | v3.8 or later. |
| NVIDIA Driver | v565 or later for GPU allocation. v570.158.01 or later if using [ComputeDomains](#computedomains-additional-prerequisites). |
| CDI | Enabled in your container runtime. This is enabled by default in containerd 2.0+ and CRIO v1.27+. The DRA Driver uses CDI to expose GPUs to containers. |
Expand Down Expand Up @@ -39,47 +38,3 @@ It can manage the following DRA Driver for NVIDIA GPUs prerequisites for you:
- GPU Feature Discovery (GFD), required for ComputeDomains.

If you choose to install the GPU Operator, follow the [DRA Driver for NVIDIA GPUs install guide](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/dra-intro-install.html) in the GPU Operator documentation. It covers installing the GPU Operator with the NVIDIA Kubernetes Device Plugin disabled and installing the DRA Driver for NVIDIA GPUs.

## Enable DRA on Kubernetes v1.32 and v1.33

On Kubernetes v1.34 and later, `DynamicResourceAllocation` is enabled by default and no additional configuration is required.

On Kubernetes v1.32 and v1.33, enable the following on each component:

| Component | Requirement |
|---|---|
| kube-apiserver | Enable the `DynamicResourceAllocation` feature gate and the `resource.k8s.io/v1beta1` API group (available on v1.32 and v1.33). On v1.33, also enable `resource.k8s.io/v1beta2`. |
| kube-controller-manager | Enable the `DynamicResourceAllocation` feature gate |
| kube-scheduler | Enable the `DynamicResourceAllocation` feature gate |
| kubelet | Enable the `DynamicResourceAllocation` feature gate |

How you apply these depends on your cluster setup. For managed Kubernetes distributions (EKS, GKE, AKS, and others), refer to your provider's documentation. Not all providers support enabling `DynamicResourceAllocation` on v1.32 or v1.33 clusters.

### Example: kubeadm

The following `kubeadm-init.yaml` enables DRA for a new cluster using [kubeadm](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/control-plane-flags/):

```yaml
apiVersion: kubeadm.k8s.io/v1beta4
kind: ClusterConfiguration
apiServer:
extraArgs:
- name: "feature-gates"
value: "DynamicResourceAllocation=true"
- name: "runtime-config"
# On v1.32, omit "resource.k8s.io/v1beta2=true"
value: "resource.k8s.io/v1beta1=true,resource.k8s.io/v1beta2=true"
controllerManager:
extraArgs:
- name: "feature-gates"
value: "DynamicResourceAllocation=true"
scheduler:
extraArgs:
- name: "feature-gates"
value: "DynamicResourceAllocation=true"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
featureGates:
DynamicResourceAllocation: true
```