The e2e tests were recently converted from bash scripts to Go/Ginkgo. They work well but carry over some design choices from the bash era that make them harder to debug, extend, and maintain. This tracks the overall cleanup effort.
Key areas:
- Error messages: Several
Eventually calls swallow errors, making failures hard to debug (e.g. pod-not-found shows up as a generic timeout after 120s)
- Test isolation: All 8 test manifests are deployed together in
BeforeSuite with a global observedGPUs map tracking state across tests. Tests can't run independently and order matters (test7 is deliberately deployed early)
- Boilerplate: The sharing tests (3, 4, 5, 6) repeat nearly identical verification logic with minor variations
- Per-test driver config: The driver is installed once with fixed Helm values. No test can use different values, and this makes future work like upgrade testing harder
Planned Improvements:
The e2e tests were recently converted from bash scripts to Go/Ginkgo. They work well but carry over some design choices from the bash era that make them harder to debug, extend, and maintain. This tracks the overall cleanup effort.
Key areas:
Eventuallycalls swallow errors, making failures hard to debug (e.g. pod-not-found shows up as a generic timeout after 120s)BeforeSuitewith a globalobservedGPUsmap tracking state across tests. Tests can't run independently and order matters (test7 is deliberately deployed early)Planned Improvements: