Skip to content

Ensure read write many first#339

Open
nikola-jokic wants to merge 11 commits into
mainfrom
nikola-jokic/rwx-volume-by-default
Open

Ensure read write many first#339
nikola-jokic wants to merge 11 commits into
mainfrom
nikola-jokic/rwx-volume-by-default

Conversation

@nikola-jokic
Copy link
Copy Markdown
Collaborator

@nikola-jokic nikola-jokic commented Apr 22, 2026

Use RWX as a default while allowing RWO volumes for the hook.

The hook 0.8 release used exec cp. The exec API turned out to be slow due to a directory structure and the amount of files needed to be copied over to the workflow pod.

Since most major k8s providers have RWX volumes available, the choice has been made that the default hook implementation should by default assume RWX volumes while allowing RWO configuration by setting a single environment variable.

The appeal of not relying on volumes was good, but the performance hit was too large as well as the complexity with it. The hook ended up being kind of a "filesystem driver" that will copy assets and check if the assets are written on the disk.

The default implementation of the hook should not be that slow. Therefore, the decision has been made to rely on volumes again.

Since the hook implementation is intended to be a reference implementation that could be used out of box, users who prefer the exec cp approach can fork the repo and continue with the 0.8.1. Everyone is always welcome to maintain their own version of the hook.

Copilot AI review requested due to automatic review settings April 22, 2026 15:24
@nikola-jokic nikola-jokic requested a review from a team as a code owner April 22, 2026 15:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the k8s runner hooks to use a shared PersistentVolumeClaim-backed work volume (instead of ad-hoc copy/exec flows) and switches container-step execution from standalone Pods to batch Jobs, aligning storage behavior across job/step containers.

Changes:

  • Introduce a single shared volume mount model (POD_VOLUME_NAME PVC + containerVolumes()), and update job/step execution to rely on it.
  • Replace container-step “pod per step” with a Kubernetes Job + helper to resolve the job’s pod name.
  • Update test setup to provision StorageClass/PV/PVC for e2e tests and clean them up.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
packages/k8s/tests/test-setup.ts Provisions StorageClass/PV/PVC for tests; updates initialization/cleanup sequencing around the new shared volume model.
packages/k8s/src/k8s/utils.ts Adds containerVolumes() and writeEntryPointScript(); updates argument handling (fixArgs) for exec invocations.
packages/k8s/src/k8s/index.ts Switches job pod volume to PVC, adds Job creation path for container steps, updates permissions, and introduces node pinning/affinity helpers.
packages/k8s/src/hooks/run-script-step.ts Removes copy/merge temp-dir logic and runs via entrypoint script expected to be present on the shared volume.
packages/k8s/src/hooks/run-container-step.ts Runs container steps as Jobs, streams logs, and optionally injects env via Secret.
packages/k8s/src/hooks/prepare-job.ts Uses createPod, mounts via containerVolumes, and copies externals into the shared work volume path.
Comments suppressed due to low confidence (1)

packages/k8s/tests/test-setup.ts:90

  • cleanupK8sResources deletes the PVC/PV before deleting the pods that may be using the claim. This can leave the PVC stuck terminating or cause the PV delete to fail (still bound). Delete the job/workflow pods first, then the PVC, then the PV, and finally the StorageClass.
  async cleanupK8sResources(): Promise<void> {
    await k8sApi
      .deleteNamespacedPersistentVolumeClaim({
        name: `${this.podName}-work`,
        namespace: 'default',
        gracePeriodSeconds: 0
      })
      .catch((e: k8s.ApiException<any>) => {
        if (e.code !== 404) {
          console.error(JSON.stringify(e))
        }
      })
    await k8sApi
      .deletePersistentVolume({ name: `${this.podName}-pv` })
      .catch((e: k8s.ApiException<any>) => {
        if (e.code !== 404) {
          console.error(JSON.stringify(e))
        }
      })
    await k8sApi
      .deleteNamespacedPod({
        name: this.podName,
        namespace: 'default',
        gracePeriodSeconds: 0
      })

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/k8s/src/hooks/run-container-step.ts Outdated
Comment thread packages/k8s/src/k8s/utils.ts
Comment thread packages/k8s/src/k8s/index.ts Outdated
Comment on lines +147 to +160
job.metadata.name = getStepPodName()
job.metadata.labels = { [runnerInstanceLabel.key]: runnerInstanceLabel.value }
job.metadata.annotations = {}

job.spec = new k8s.V1JobSpec()
job.spec.ttlSecondsAfterFinished = 300
job.spec.backoffLimit = 0
job.spec.template = new k8s.V1PodTemplateSpec()

job.spec.template.spec = new k8s.V1PodSpec()
job.spec.template.metadata = new k8s.V1ObjectMeta()
job.spec.template.metadata.labels = {}
job.spec.template.metadata.annotations = {}
job.spec.template.spec.containers = [container]
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createJob applies RunnerInstanceLabel only to the Job object, but the Pods created by the Job will not automatically inherit Job metadata labels. Since prunePods() selects pods by RunnerInstanceLabel, these step pods won't be pruned. Add the same label to job.spec.template.metadata.labels (and keep it when merging any extension metadata).

Copilot uses AI. Check for mistakes.
Comment thread packages/k8s/src/hooks/run-script-step.ts Outdated
@nikola-jokic nikola-jokic force-pushed the nikola-jokic/rwx-volume-by-default branch from c47c78e to 5d337ff Compare April 22, 2026 20:59
nikola-jokic and others added 7 commits April 23, 2026 18:30
- Change package.json bootstrap to use 'npm ci --prefix packages/hooklib'
- Change k8s-tests workflow to use 'npm ci' instead of 'npm install'
- All three packages (hooklib, k8s, docker) now use deterministic installs
- All three CI jobs (format-and-lint, docker-tests, k8s-tests) now use npm ci
…nity

- RWX volumes allow job pods to be scheduled on any cluster node
- RWO volumes require affinity to pin job pods to runner's node
- Remove ACTIONS_RUNNER_USE_KUBE_SCHEDULER from RWX migration steps
- Emphasize resource utilization benefits of RWX free scheduling

Co-authored-by: Sisyphus <sisyphus@ohmyopencode.com>
Change ACTIONS_RUNNER_USE_KUBE_SCHEDULER to ACTIONS_RUNNER_DISABLE_KUBE_SCHEDULER
to make affinity-based scheduling the first-class (default) implementation.

Breaking Change:
- OLD: Set ACTIONS_RUNNER_USE_KUBE_SCHEDULER=true to enable affinity (opt-in)
- NEW: Affinity is enabled by default, set ACTIONS_RUNNER_DISABLE_KUBE_SCHEDULER=true to disable (opt-out)

Code changes:
- utils.ts: Rename constant and invert useKubeScheduler() logic
- rwo-affinity-test.ts: Update tests to verify default affinity behavior
- ADR 0135: Update to reflect opt-out model
- README: Update guidance to reflect default behavior

Co-authored-by: Sisyphus <sisyphus@ohmyopencode.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@nikola-jokic nikola-jokic force-pushed the nikola-jokic/rwx-volume-by-default branch from 68f9f04 to 68897b6 Compare May 2, 2026 20:38
nikola-jokic and others added 4 commits May 6, 2026 19:18
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants