Install a separate driver instance for each test#201
Conversation
Signed-off-by: William Yao <william2000yao@gmail.com>
|
|
||
| // verifyWebhook waits until the validating webhook is serving for the given | ||
| // DeviceClass by creating a dry-run ResourceClaim until it succeeds. | ||
| func verifyWebhook(ctx context.Context, deviceClassName string) { |
There was a problem hiding this comment.
This code is mostly moved from e2e_setup_test.go at line 107: https://github.com/willie-yao/dra-example-driver/blob/main/test/e2e/e2e_setup_test.go#L107
Changes: now takes a deviceClassNAme parameter, the test claim's Name is suffixed with it due to parallel webhook installs and added DeviceClass to error messages
| driverPods, err := clientset.CoreV1().Pods(cfg.Namespace).List(ctx, metav1.ListOptions{ | ||
| LabelSelector: driverPodSelector, | ||
| }) | ||
| if err != nil { | ||
| fmt.Fprintf(GinkgoWriter, "Failed to list driver pods: %v\n", err) | ||
| return | ||
| } | ||
| for _, pod := range driverPods.Items { | ||
| for _, c := range pod.Spec.Containers { | ||
| stream, err := clientset.CoreV1().Pods(cfg.Namespace).GetLogs(pod.Name, &corev1.PodLogOptions{ | ||
| Container: c.Name, | ||
| TailLines: &tailLines, | ||
| }).Stream(ctx) | ||
| if err != nil { | ||
| fmt.Fprintf(GinkgoWriter, | ||
| "Driver pod %s, container %s: failed to get logs: %v\n", | ||
| pod.Name, c.Name, err) | ||
| continue | ||
| } | ||
| buf := new(bytes.Buffer) | ||
| _, _ = io.Copy(buf, stream) | ||
| stream.Close() | ||
| fmt.Fprintf(GinkgoWriter, | ||
| "Driver pod %s, container %s (last %d lines):\n%s\n", | ||
| pod.Name, c.Name, tailLines, buf.String()) | ||
| } | ||
| } |
There was a problem hiding this comment.
This is moved from e2e_setup_test.go on line 192: https://github.com/willie-yao/dra-example-driver/blob/main/test/e2e/e2e_setup_test.go#L192
|
/assign @nojnhuh |
|
|
||
| // runHelm executes a helm subcommand and returns its combined output. | ||
| func runHelm(ctx context.Context, args ...string) (string, error) { | ||
| cmd := exec.CommandContext(ctx, "helm", args...) |
There was a problem hiding this comment.
Can we use Helm's Go library instead of invoking the binary? To keep dependencies separate, we should turn the test directory into a separate module like CAPZ.
There was a problem hiding this comment.
Done in 784f6e9. CAPZ doesn't actually have an entirely different submodule but uses a .golangci.yml, but I think a sub-module would work better here anyways.
There was a problem hiding this comment.
Ah, CAPI does this but I forgot CAPZ doesn't.
Signed-off-by: William Yao <william2000yao@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: willie-yao The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Drop the partitionable-devices spec and helmUpgradeDriver helper from main: they assume a single global driver, which is incompatible with the per-test driver install model on this branch. The spec is reintroduced adapted to the new model in a follow-up commit.
The spec from kubernetes-sigs#150 reconfigured a single global driver via helmUpgradeDriver and ran Serial. With the per-test driver install model on this branch it can just install its own release with kubeletPlugin.gpuPartitions=4 via DriverConfig.ExtraValues and run in parallel with everything else.
Signed-off-by: William Yao <william2000yao@gmail.com>
nojnhuh
left a comment
There was a problem hiding this comment.
Sorry for the annoying comments on the workspace/lint periphery here. I promise I'm starting to look at the main point of this PR.
| golangci-lint run ./... | ||
| cd test && golangci-lint run --build-tags=e2e ./... |
There was a problem hiding this comment.
This hides lint failures in the test module if any issues are found in the main module. Can we combine into one call?
| golangci-lint run ./... | |
| cd test && golangci-lint run --build-tags=e2e ./... | |
| golangci-lint run ./... | |
| golangci-lint run --build-tags=e2e ./... ./test/... |
There was a problem hiding this comment.
Tried combining it with golangci-lint run --build-tags=e2e ./... ./test/... but it silently doesn't lint the test module. It logs pattern ./test/...: directory prefix test does not contain main module or its selected dependencies and exits 0 with 0 issues. I'm thinking to do it by just creating another make target for test
.PHONY: lint lint-root lint-test
lint:
@$(MAKE) -k lint-root lint-test
lint-root:
golangci-lint run ./...
lint-test:
cd test && golangci-lint run --build-tags=e2e ./...
There was a problem hiding this comment.
With a workspace set up, it seems to work for me: willie-yao/dra-example-driver@separate-driver-install...nojnhuh:dra-example-driver:workspace-lint
% make lint
golangci-lint run --build-tags=e2e ./... ./test/...
cmd/dra-example-webhook/main.go:183:1: Comment should end in a period (godot)
// function
^
test/e2e/e2e_setup_test.go:54:1: Comment should end in a period (godot)
// driverPodSelector finds kubelet plugin Pods within an installed driver's release namespace
^
2 issues:
* godot: 2
make: *** [Makefile:83: lint] Error 1There was a problem hiding this comment.
Ah that was completely my bad. I had v0.0.0 set before when testing something. Setting it to v0.2.1 fixed it! Thanks for checking this for me
| k8s.io/api v0.36.0 | ||
| k8s.io/apimachinery v0.36.0 | ||
| k8s.io/client-go v0.36.0 | ||
| sigs.k8s.io/dra-example-driver v0.0.0 |
There was a problem hiding this comment.
go mod tidy seems to want to put v0.2.1 here if this doesn't already exist. That should still ultimately resolve to the local version though and not that actual tagged version.
There was a problem hiding this comment.
I confirmed locally that with replace => ../ present, v0.2.1 or any other version resolves to the local checkout. Should I change this to v0.2.1?
Signed-off-by: William Yao <william2000yao@gmail.com>
e5f08a6 to
27b36c3
Compare
| // runUpgradeOrInstall emulates `helm upgrade --install`: if a release with | ||
| // the configured name already exists it is upgraded in place, otherwise a | ||
| // fresh install is performed. | ||
| func runUpgradeOrInstall(ctx context.Context, actionCfg *action.Configuration, install *action.Install, chrt *chart.Chart, values map[string]any) { |
There was a problem hiding this comment.
In practice, will this ever actually perform an upgrade of an existing Helm release? If we expect each test to have a separate instance of the driver, then I think we'd want to error if that driver already exists instead of upgrade and risk affecting a different test.
The script used helm upgrade --install before mostly to be copy-pasteable, but that was always expected to first be run on a fresh kind cluster without the driver installed where a plain helm install should have worked.
| actionCfg := newHelmActionConfig(cfg.Namespace, registryClient) | ||
|
|
||
| install := action.NewInstall(actionCfg) | ||
| install.ReleaseName = cfg.ReleaseName |
There was a problem hiding this comment.
Could we automatically generate a release name instead? That will make sure there aren't conflicts between tests, tests don't have to decide for themselves what the driver should be called, and we don't have to validate them up front.
| Expect(validation.IsDNS1123Label(cfg.ReleaseName)).To(BeEmpty(), | ||
| "DriverConfig.ReleaseName %q must be a valid DNS-1123 label", cfg.ReleaseName) | ||
| if cfg.Namespace == "" { | ||
| cfg.Namespace = cfg.ReleaseName |
There was a problem hiding this comment.
Like the release name, can we generate a new, separate namespace for each instance of the driver?
| // maxDriverNameLen is the longest driver name that fits within Linux's | ||
| // 108-byte UNIX_PATH_MAX after the kubelet appends its registrar socket | ||
| // prefix and per-pod UID suffix. | ||
| maxDriverNameLen = 28 |
There was a problem hiding this comment.
Can you expand on how this value was calculated?
| name string | ||
| fileName string | ||
| }{ | ||
| {name: "v1 ResourceClaim", fileName: "invalid_rc_v1.yaml"}, |
There was a problem hiding this comment.
Does it simplify things if we construct these ResourceClaims inline vs. reading them from a file?
| // with the per-test driver name and (when set) extended resource name. | ||
| func substituteDriverIdentifiers(raw string, drv DriverConfig) string { | ||
| GinkgoHelper() | ||
| out := strings.ReplaceAll(raw, defaultDeviceClassName, drv.DriverName) |
There was a problem hiding this comment.
Eventually, the manifests will be referencing other drivers in addition to gpu.example.com. Some manifests may even rely on multiple separate instances of the driver running different profiles. Not a blocker for this PR, but we should find a better way to handle dynamic driver names.
Each
Itblock now installs its own DRA driver release with a unique driver name, and uninstalls it on teardown.Fixes #171
Changes:
test/e2e/driver_test.gowithinstallDriver(ctx, DriverConfig)helper. Shells out tohelm(already required bysetup-e2e.sh); honorsHELM_CHART_PATHenv var.deployManifestaccepts aDriverConfigand substitutesgpu.example.com-> per-test driver name (andexample.com/gpu-> per-test extended resource name) in demo manifests before applying.helm --wait, polls explicitly forDeviceClass+ResourceSlices+ webhook readiness before returning from install.DeferCleanupLIFO: driver-log diagnostics -> helm uninstall -> namespace delete with foreground propagation + termination wait.setup-e2e.shno longer installs the driver. Cert-manager + kind cluster + image build stay (cluster-wide infra).gpu.example.comand runOrdered, Serialso their static testdata stays valid.