Skip to content

OSAC-854: add nightly vmaas snapshot build job#79377

Open
omer-vishlitzky wants to merge 2 commits into
openshift:mainfrom
omer-vishlitzky:osac-854-snapshot-nightly
Open

OSAC-854: add nightly vmaas snapshot build job#79377
omer-vishlitzky wants to merge 2 commits into
openshift:mainfrom
omer-vishlitzky:osac-854-snapshot-nightly

Conversation

@omer-vishlitzky
Copy link
Copy Markdown
Contributor

@omer-vishlitzky omer-vishlitzky commented May 17, 2026

Summary

OSAC presubmit E2E jobs boot OpenShift clusters from pre-built snapshots using
cluster-tool, which brings
cluster boot time down from ~2 hours to ~6 minutes. Currently these snapshots
are built manually. This PR automates that with a nightly periodic job.

What this adds

  • osac-project-cluster-tool-snapshot step-registry ref: installs cluster-tool
    on the provisioned machine, snapshots the running OSAC cluster, and pushes the
    resulting OCI image to quay.io
  • osac-project-cluster-tool-snapshot-vmaas workflow: chains the full pipeline —
    acquire baremetal machine, provision OCP via assisted-installer, install OSAC via
    setup.sh, then snapshot and push
  • snapshot-vmaas periodic job: runs the workflow nightly at 2am UTC from
    osac-test-infra, keeping the snapshot current with the latest osac-installer

Flow

ofcir-acquire → assisted-ofcir-setup → assisted-common-pre → osac-project-installer → cluster-tool snapshot → push to quay.io

The pushed snapshot image is what all e2e-vmaas presubmit jobs pull via
CLUSTER_TOOL_FLAVOR_IMAGE to boot clusters in ~6 minutes.

Test plan

  • Rehearse via /test snapshot-vmaas
  • Verify snapshot image appears in quay.io
  • Verify existing boot step can pull and boot from the new snapshot

Summary

This PR updates OpenShift CI configuration (openshift/release) for the osac-project/osac-test-infra repository to add an automated nightly pipeline that builds and publishes VMAAS cluster snapshots used by OSAC E2E jobs.

What changed (practical effect)

  • New periodic job: snapshot-vmaas (ci-operator/config/osac-project/osac-test-infra/osac-project-osac-test-infra-main.yaml)

    • Runs nightly at 02:00 UTC (cron: "0 2 * * *")
    • capabilities: [intranet], cluster_profile: packet-assisted
    • Invokes workflow: osac-project-cluster-tool-snapshot-vmaas
    • Provides ASSISTED_CONFIG to the installer to provision a small assisted cluster (OLM operators: cnv,lvm; NUM_MASTERS=1; NUM_WORKERS=0; MASTER_MEMORY=65536; 2×200GB disks; MASTER_CPU=24; OPENSHIFT_VERSION=4.20)
  • New workflow: osac-project-cluster-tool-snapshot-vmaas (ci-operator/step-registry/osac-project/cluster-tool/snapshot-vmaas/osac-project-cluster-tool-snapshot-vmaas-workflow.yaml)

    • Orchestrates: ofcir-acquire → assisted-ofcir-setup → assisted-common-pre → osac-project-installer → osac-project-cluster-tool-snapshot → post gather/release steps
    • Uses cluster_profile: packet-assisted, CLUSTERTYPE: "assisted_large_el9", allow_best_effort_post_steps: true
    • Purpose: provision an assisted baremetal OCP cluster, run OSAC setup, snapshot the running cluster, and push the snapshot OCI image to a container registry
  • New step-ref and commands to perform snapshot and push:

    • Step YAML: ci-operator/step-registry/osac-project/cluster-tool/snapshot/osac-project-cluster-tool-snapshot-ref.yaml
      • Defines execution timing (grace_period, timeout), resource requests, Vault credential mounts, and env vars (CLUSTER_TOOL_COMMIT, SNAPSHOT_REGISTRY)
    • Commands script: ci-operator/step-registry/osac-project/cluster-tool/snapshot/osac-project-cluster-tool-snapshot-commands.sh
      • SSHs to the provisioned ci_machine, downloads cluster-tool at the specified commit, initializes it, discovers the running test-infra cluster, creates a snapshot with flavor "osac-vmaas", logs into the registry using Quay creds from Vault, and pushes the snapshot via cluster-tool
  • Ownership/metadata:

    • Added OWNERS files and workflow metadata mapping; approvers/reviewers set to the osac-cicd group for the new step/workflow files.

Impact / motivation

  • Produces nightly VMAAS cluster snapshot OCI images consumed by e2e-vmaas presubmit jobs (via CLUSTER_TOOL_FLAVOR_IMAGE). This speeds test cluster boot from ~2 hours to ~6 minutes and keeps snapshots aligned with the latest osac-installer changes.

Testing notes

  • Rehearse via Prow: /test snapshot-vmaas
  • Verify the snapshot OCI image appears in the configured registry (quay.io) and that existing boot steps can pull and boot from the pushed snapshot.

Robot feedback

  • openshift-ci-robot confirmed the PR references OSAC-854.
  • The robot warned the referenced JIRA task lacks the expected target version (expected "5.0.0").

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 17, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented May 17, 2026

@omer-vishlitzky: This pull request references OSAC-854 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

OSAC presubmit E2E jobs boot OpenShift clusters from pre-built snapshots using
cluster-tool, which brings
cluster boot time down from ~2 hours to ~6 minutes. Currently these snapshots
are built manually. This PR automates that with a nightly periodic job.

What this adds

  • osac-project-cluster-tool-snapshot step-registry ref: installs cluster-tool
    on the provisioned machine, snapshots the running OSAC cluster, and pushes the
    resulting OCI image to quay.io
  • osac-project-cluster-tool-snapshot-vmaas workflow: chains the full pipeline —
    acquire baremetal machine, provision OCP via assisted-installer, install OSAC via
    setup.sh, then snapshot and push
  • snapshot-vmaas periodic job: runs the workflow nightly at 2am UTC from
    osac-test-infra, keeping the snapshot current with the latest osac-installer

Flow

ofcir-acquire → assisted-ofcir-setup → assisted-common-pre → osac-project-installer → cluster-tool snapshot → push to quay.io

The pushed snapshot image is what all e2e-vmaas presubmit jobs pull via
CLUSTER_TOOL_FLAVOR_IMAGE to boot clusters in ~6 minutes.

Test plan

  • Rehearse via /test snapshot-vmaas
  • Verify snapshot image appears in quay.io
  • Verify existing boot step can pull and boot from the new snapshot

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 17, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 3745f6de-b55e-4f27-8dca-1af0b868617c

📥 Commits

Reviewing files that changed from the base of the PR and between 9f38a20 and e0abae1.

📒 Files selected for processing (1)
  • ci-operator/config/osac-project/osac-test-infra/osac-project-osac-test-infra-main.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • ci-operator/config/osac-project/osac-test-infra/osac-project-osac-test-infra-main.yaml

Walkthrough

Adds a new snapshot-vmaas CI job and step-registry that provisions an assisted baremetal cluster, runs cluster-tool to create a vMAAS snapshot, and pushes the resulting OCI image to a registry; includes workflow YAML, step implementation, metadata/OWNERS, and test registration.

Changes

Cluster Snapshot Workflow

Layer / File(s) Summary
Snapshot step implementation
ci-operator/step-registry/osac-project/cluster-tool/snapshot/OWNERS, ci-operator/step-registry/osac-project/cluster-tool/snapshot/osac-project-cluster-tool-snapshot-ref.metadata.json, ci-operator/step-registry/osac-project/cluster-tool/snapshot/osac-project-cluster-tool-snapshot-ref.yaml, ci-operator/step-registry/osac-project/cluster-tool/snapshot/osac-project-cluster-tool-snapshot-commands.sh
Step registry defines the snapshot step and script that reads Quay creds from Vault, downloads/initializes cluster-tool, finds a running test-infra cluster, creates a osac-vmaas snapshot, logs into the registry, and pushes the OCI snapshot.
Snapshot vmaas workflow
ci-operator/step-registry/osac-project/cluster-tool/snapshot-vmaas/OWNERS, ci-operator/step-registry/osac-project/cluster-tool/snapshot-vmaas/osac-project-cluster-tool-snapshot-vmaas-workflow.metadata.json, ci-operator/step-registry/osac-project/cluster-tool/snapshot-vmaas/osac-project-cluster-tool-snapshot-vmaas-workflow.yaml
Defines the vMAAS workflow orchestration (pre/test/post steps), sets cluster_profile: packet-assisted, CLUSTERTYPE: "assisted_large_el9", and documents snapshot behavior.
Test job registration
ci-operator/config/osac-project/osac-test-infra/osac-project-osac-test-infra-main.yaml
Adds snapshot-vmaas test job scheduled cron: 0 2 * * * with capabilities: [intranet], cluster_profile: packet-assisted, supplies ASSISTED_CONFIG (including OPENSHIFT_VERSION=4.20), and invokes the workflow.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

lgtm, rehearsals-ack

Suggested reviewers

  • danmanor
  • eranco74
🚥 Pre-merge checks | ✅ 12
✅ Passed checks (12 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'OSAC-854: add nightly vmaas snapshot build job' clearly and concisely describes the main change: adding a nightly job for VMAAS cluster snapshots, which is the primary objective of the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR contains no Ginkgo test code (It(), Describe(), Context(), etc.). The check is not applicable - only CI configuration, shell scripts, and metadata files are added. No dynamic test names present.
Test Structure And Quality ✅ Passed Custom check targets Ginkgo test code. PR contains only CI/operator configuration (YAML, bash, metadata, OWNERS) with no Ginkgo tests. Check is not applicable.
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests or Go code are added in this PR. Changes consist only of CI configuration files, metadata, and a Bash script. The check is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR does not introduce any new Ginkgo e2e tests. It contains only CI/operator configuration and shell scripts, not e2e test code.
Topology-Aware Scheduling Compatibility ✅ Passed PR adds CI/CD job configs and step-registry definitions with no Kubernetes manifests, pod specs, or scheduling constraints. Check applies only to deployment manifests and operator code.
Ote Binary Stdout Contract ✅ Passed PR contains only CI configuration YAML and bash scripts. OTE contract check applies to Go test code only. No applicable code present.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR adds only CI/operator infrastructure (YAML configs, Bash scripts, OWNERS), no Ginkgo e2e tests. Check is not applicable.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 17, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: omer-vishlitzky

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 17, 2026
@openshift-ci openshift-ci Bot requested review from akshaynadkarni and danmanor May 17, 2026 19:36
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/osac-project/cluster-tool/snapshot/osac-project-cluster-tool-snapshot-commands.sh`:
- Around line 50-55: The podman login call currently passes the password with -p
which exposes QUAY_PASS in the process list; change it to use --password-stdin
and pipe the password into podman login instead of using -p. Locate the podman
login invocation (the line using podman login --root ... "$(echo ${REGISTRY} |
cut -d/ -f1)" -u "${QUAY_USER}" -p "${QUAY_PASS}") and replace the -p usage by
supplying QUAY_PASS via stdin (keep the --root, registry host extraction, and -u
"${QUAY_USER}" as-is) so the password is not visible in process arguments.
- Around line 35-37: Download of the cluster-tool binary should include SHA256
verification: download to a temp path (instead of writing directly to
/usr/local/bin/cluster-tool), ensure CLUSTER_TOOL_SHA256 environment variable is
present, compute the SHA256 of the downloaded file (e.g., via sha256sum or
shasum -a 256), compare it to CLUSTER_TOOL_SHA256 and exit non‑zero on mismatch,
only move the verified file into /usr/local/bin/cluster-tool and then run chmod
+x on that path; update the curl -> temp file step and references to COMMIT and
/usr/local/bin/cluster-tool accordingly so the file is never executed or
installed unless the checksum matches.
- Around line 15-25: The script currently enables xtrace around the ssh
invocation and passes QUAY_PASS as a positional argument (QUAY_PASS and the
timeout/ssh invocation block), which risks leaking the password; change this by
saving the current xtrace state (e.g., store "$(set +x; false || true)" or check
$- for xtrace), then disable xtrace before reading/using QUAY_PASS and before
the ssh command that includes "${QUAY_PASS}", and finally restore the original
xtrace state immediately after; apply the identical save/disable/restore pattern
around the podman login invocation (the podman login block that uses QUAY_PASS)
so credentials are never printed to CI logs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 51bedd0b-c514-4e65-9a76-664080c5dcb6

📥 Commits

Reviewing files that changed from the base of the PR and between 16e4c03 and 227a9f9.

⛔ Files ignored due to path filters (1)
  • ci-operator/jobs/osac-project/osac-test-infra/osac-project-osac-test-infra-main-periodics.yaml is excluded by !ci-operator/jobs/**
📒 Files selected for processing (8)
  • ci-operator/config/osac-project/osac-test-infra/osac-project-osac-test-infra-main.yaml
  • ci-operator/step-registry/osac-project/cluster-tool/snapshot-vmaas/OWNERS
  • ci-operator/step-registry/osac-project/cluster-tool/snapshot-vmaas/osac-project-cluster-tool-snapshot-vmaas-workflow.metadata.json
  • ci-operator/step-registry/osac-project/cluster-tool/snapshot-vmaas/osac-project-cluster-tool-snapshot-vmaas-workflow.yaml
  • ci-operator/step-registry/osac-project/cluster-tool/snapshot/OWNERS
  • ci-operator/step-registry/osac-project/cluster-tool/snapshot/osac-project-cluster-tool-snapshot-commands.sh
  • ci-operator/step-registry/osac-project/cluster-tool/snapshot/osac-project-cluster-tool-snapshot-ref.metadata.json
  • ci-operator/step-registry/osac-project/cluster-tool/snapshot/osac-project-cluster-tool-snapshot-ref.yaml

Comment on lines +15 to +25
set +x
QUAY_PASS=$(cat /var/run/vault/osac-quay-creds/password)
set -x

timeout -s 9 90m ssh -F "${SHARED_DIR}/ssh_config" ci_machine bash -s \
"${CLUSTER_TOOL_COMMIT}" \
"${SNAPSHOT_REGISTRY}" \
"${FLAVOR_NAME}" \
"${QUAY_USER}" \
"${QUAY_PASS}" \
<<'REMOTE_EOF'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

head -50 ci-operator/step-registry/osac-project/cluster-tool/snapshot/osac-project-cluster-tool-snapshot-commands.sh

Repository: openshift/release

Length of output: 1570


🏁 Script executed:

tail -30 ci-operator/step-registry/osac-project/cluster-tool/snapshot/osac-project-cluster-tool-snapshot-commands.sh

Repository: openshift/release

Length of output: 1221


🏁 Script executed:

# Let me verify the exact line numbers and check the context around the credential handling
wc -l ci-operator/step-registry/osac-project/cluster-tool/snapshot/osac-project-cluster-tool-snapshot-commands.sh

Repository: openshift/release

Length of output: 172


🏁 Script executed:

cat -n ci-operator/step-registry/osac-project/cluster-tool/snapshot/osac-project-cluster-tool-snapshot-commands.sh

Repository: openshift/release

Length of output: 2382


🏁 Script executed:

# Verify the fix syntax is correct by checking bash parameter expansion behavior
bash -c 'set -o nounset; was_xtrace=0; [[ "$-" == *x* ]] && was_xtrace=1; echo "was_xtrace=$was_xtrace"; set -x; echo "After set -x"; [[ "$-" == *x* ]] && was_xtrace=1; echo "was_xtrace=$was_xtrace"; (( was_xtrace )) && echo "Would re-enable xtrace"'

Repository: openshift/release

Length of output: 272


Prevent Quay password leakage through xtrace during SSH invocation.

Line 17 enables xtrace, and the ssh command on lines 19–24 passes QUAY_PASS as an argument. With xtrace enabled, bash expands and outputs the full command including the password to CI logs before execution. Save the xtrace state before disabling it for credential operations, then restore it afterward, rather than always re-enabling.

Suggested fix
- set +x
- QUAY_PASS=$(cat /var/run/vault/osac-quay-creds/password)
- set -x
+was_xtrace=0
+[[ "$-" == *x* ]] && was_xtrace=1
+set +x
+QUAY_PASS=$(cat /var/run/vault/osac-quay-creds/password)

 timeout -s 9 90m ssh -F "${SHARED_DIR}/ssh_config" ci_machine bash -s \
     "${CLUSTER_TOOL_COMMIT}" \
     "${SNAPSHOT_REGISTRY}" \
     "${FLAVOR_NAME}" \
     "${QUAY_USER}" \
     "${QUAY_PASS}" \
     <<'REMOTE_EOF'
@@
 REMOTE_EOF
+
+(( was_xtrace )) && set -x

Per coding guidelines, step registry command scripts should avoid accidental disclosure of passwords, tokens, API keys, and cluster URLs via xtrace. Apply the same pattern to the podman login invocation at lines 51–55.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/osac-project/cluster-tool/snapshot/osac-project-cluster-tool-snapshot-commands.sh`
around lines 15 - 25, The script currently enables xtrace around the ssh
invocation and passes QUAY_PASS as a positional argument (QUAY_PASS and the
timeout/ssh invocation block), which risks leaking the password; change this by
saving the current xtrace state (e.g., store "$(set +x; false || true)" or check
$- for xtrace), then disable xtrace before reading/using QUAY_PASS and before
the ssh command that includes "${QUAY_PASS}", and finally restore the original
xtrace state immediately after; apply the identical save/disable/restore pattern
around the podman login invocation (the podman login block that uses QUAY_PASS)
so credentials are never printed to CI logs.

Comment on lines +35 to +37
curl -fsSL "https://raw.githubusercontent.com/omer-vishlitzky/cluster-tool/${COMMIT}/cluster-tool" \
-o /usr/local/bin/cluster-tool
chmod +x /usr/local/bin/cluster-tool
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, find the file
find . -name "osac-project-cluster-tool-snapshot-commands.sh" -type f

Repository: openshift/release

Length of output: 171


🏁 Script executed:

#!/bin/bash
# Read the file to verify the exact content at lines 35-37
if [ -f "ci-operator/step-registry/osac-project/cluster-tool/snapshot/osac-project-cluster-tool-snapshot-commands.sh" ]; then
  echo "=== File content (full) ==="
  wc -l ci-operator/step-registry/osac-project/cluster-tool/snapshot/osac-project-cluster-tool-snapshot-commands.sh
  echo ""
  echo "=== Lines 30-45 for context ==="
  sed -n '30,45p' ci-operator/step-registry/osac-project/cluster-tool/snapshot/osac-project-cluster-tool-snapshot-commands.sh
fi

Repository: openshift/release

Length of output: 899


🏁 Script executed:

#!/bin/bash
# Check for any environment variable definitions or setup related to CLUSTER_TOOL_SHA256
# Search across the directory structure for related references
rg -i "cluster_tool_sha256|CLUSTER_TOOL_SHA256" --type sh

Repository: openshift/release

Length of output: 43


🏁 Script executed:

#!/bin/bash
# Search for the COMMIT variable definition and usage context
rg -i "COMMIT" ci-operator/step-registry/osac-project/cluster-tool/snapshot/ -A 2 -B 2

Repository: openshift/release

Length of output: 3880


🏁 Script executed:

#!/bin/bash
# Check the ref.yaml file to see what env variables are defined
cat ci-operator/step-registry/osac-project/cluster-tool/snapshot/osac-project-cluster-tool-snapshot-ref.yaml

Repository: openshift/release

Length of output: 976


🏁 Script executed:

#!/bin/bash
# Check if there are any related steps that might define CLUSTER_TOOL_SHA256 or similar patterns
rg "SHA256|sha256|checksum" ci-operator/step-registry/osac-project/cluster-tool/ --type sh

Repository: openshift/release

Length of output: 43


🏁 Script executed:

#!/bin/bash
# Look at the full context of the commands.sh file to understand the overall flow
cat -n ci-operator/step-registry/osac-project/cluster-tool/snapshot/osac-project-cluster-tool-snapshot-commands.sh

Repository: openshift/release

Length of output: 2382


Add integrity verification for downloaded cluster-tool binary.

Lines [35-37] download and execute a binary from a mutable GitHub ref (main by default) without checksum verification. This poses a supply-chain risk, particularly in a step with mounted credentials that could be compromised if the tool is hijacked.

Add SHA256 verification before execution:

Suggested hardening
 curl -fsSL "https://raw.githubusercontent.com/omer-vishlitzky/cluster-tool/${COMMIT}/cluster-tool" \
     -o /usr/local/bin/cluster-tool
+echo "${CLUSTER_TOOL_SHA256}  /usr/local/bin/cluster-tool" | sha256sum -c -
 chmod +x /usr/local/bin/cluster-tool

Note: Requires defining CLUSTER_TOOL_SHA256 as an environment variable in the step reference.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/osac-project/cluster-tool/snapshot/osac-project-cluster-tool-snapshot-commands.sh`
around lines 35 - 37, Download of the cluster-tool binary should include SHA256
verification: download to a temp path (instead of writing directly to
/usr/local/bin/cluster-tool), ensure CLUSTER_TOOL_SHA256 environment variable is
present, compute the SHA256 of the downloaded file (e.g., via sha256sum or
shasum -a 256), compare it to CLUSTER_TOOL_SHA256 and exit non‑zero on mismatch,
only move the verified file into /usr/local/bin/cluster-tool and then run chmod
+x on that path; update the curl -> temp file step and references to COMMIT and
/usr/local/bin/cluster-tool accordingly so the file is never executed or
installed unless the checksum matches.

Comment on lines +50 to +55
echo "=== Authenticating to registry ==="
set +x
podman login --root /home/cluster-tool/containers/storage \
"$(echo ${REGISTRY} | cut -d/ -f1)" \
-u "${QUAY_USER}" -p "${QUAY_PASS}"
set -x
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# Check if the file exists and examine lines around 50-55
if [ -f "ci-operator/step-registry/osac-project/cluster-tool/snapshot/osac-project-cluster-tool-snapshot-commands.sh" ]; then
  echo "=== File exists, showing lines 40-65 for context ==="
  sed -n '40,65p' "ci-operator/step-registry/osac-project/cluster-tool/snapshot/osac-project-cluster-tool-snapshot-commands.sh"
else
  echo "File not found, checking what exists in the directory"
  find . -path "*/osac-project/cluster-tool/snapshot/*-commands.sh" 2>/dev/null || echo "No matching files found"
fi

Repository: openshift/release

Length of output: 1022


🌐 Web query:

podman login --password-stdin option documentation

💡 Result:

The --password-stdin option for the podman login command is used to provide a registry password via standard input (STDIN) rather than through command-line arguments or interactive prompts [1][2]. This approach is commonly used in automated scripts and CI/CD pipelines to enhance security by avoiding the exposure of passwords in process lists or shell history [1][3]. Usage Examples: 1. Using a pipe: echo $PASSWORD | podman login -u --password-stdin &lt;registry_url&gt; [1][3] 2. Using a file redirection: podman login -u --password-stdin &lt; password.txt &lt;registry_url&gt; [1][3] When using this option, Podman reads the password from STDIN and uses it to authenticate with the specified registry [1][2]. If no registry is specified, Podman defaults to the first registry listed in the [registries.search] section of the registries.conf configuration file [1][2]. Credentials are typically stored in the default authentication file, located at ${XDG_RUNTIME_DIR}/containers/auth.json [1][2].

Citations:


🏁 Script executed:

# Check if printf is available and test the approach
which printf
echo "Testing printf syntax for password piping:"
QUAY_PASS="test_password"
# Verify the printf syntax doesn't add newline
printf '%s' "${QUAY_PASS}" | wc -c
echo "Password length:"
echo -n "${QUAY_PASS}" | wc -c

Repository: openshift/release

Length of output: 189


Use --password-stdin instead of passing password via -p flag.

The current approach on line 54 exposes the password in the process list, which can be accessed via process inspection. Use stdin-based authentication as per the coding guidelines requiring sensitive data to bypass process visibility.

Suggested fix
 set +x
-podman login --root /home/cluster-tool/containers/storage \
-    "$(echo ${REGISTRY} | cut -d/ -f1)" \
-    -u "${QUAY_USER}" -p "${QUAY_PASS}"
+printf '%s' "${QUAY_PASS}" | podman login --root /home/cluster-tool/containers/storage \
+    "$(echo "${REGISTRY}" | cut -d/ -f1)" \
+    -u "${QUAY_USER}" --password-stdin
 set -x
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/osac-project/cluster-tool/snapshot/osac-project-cluster-tool-snapshot-commands.sh`
around lines 50 - 55, The podman login call currently passes the password with
-p which exposes QUAY_PASS in the process list; change it to use
--password-stdin and pipe the password into podman login instead of using -p.
Locate the podman login invocation (the line using podman login --root ...
"$(echo ${REGISTRY} | cut -d/ -f1)" -u "${QUAY_USER}" -p "${QUAY_PASS}") and
replace the -p usage by supplying QUAY_PASS via stdin (keep the --root, registry
host extraction, and -u "${QUAY_USER}" as-is) so the password is not visible in
process arguments.

@omer-vishlitzky omer-vishlitzky force-pushed the osac-854-snapshot-nightly branch from 0e0c663 to 9f38a20 Compare May 17, 2026 21:09
@omer-vishlitzky
Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-osac-project-osac-test-infra-main-snapshot-vmaas

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@omer-vishlitzky: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@omer-vishlitzky
Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-osac-project-osac-test-infra-main-snapshot-vmaas

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@omer-vishlitzky: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@omer-vishlitzky
Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-osac-project-osac-test-infra-main-snapshot-vmaas

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@omer-vishlitzky: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@omer-vishlitzky
Copy link
Copy Markdown
Contributor Author

/retest

@omer-vishlitzky
Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-osac-project-osac-test-infra-main-snapshot-vmaas

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@omer-vishlitzky: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@omer-vishlitzky
Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-osac-project-osac-test-infra-main-snapshot-vmaas

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@omer-vishlitzky: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

echo "=== Installing cluster-tool ==="
curl -fsSL "https://raw.githubusercontent.com/omer-vishlitzky/cluster-tool/${COMMIT}/cluster-tool" \
-o /usr/local/bin/cluster-tool
chmod +x /usr/local/bin/cluster-tool
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a plan to move cluster-tool to the osac-project org?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once it stabilizes, yes.

default: "main"
documentation: cluster-tool git ref to download (branch, tag, or commit)
- name: SNAPSHOT_REGISTRY
default: "quay.io/rh-ee-ovishlit/cluster-flavors:osac-vmaas-pruned"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question for the snapshot registry: should this move to an org-level quay namespace?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are working with ghcr for osac, not quay. I tried using that but it was painfully slow for such huge images. We can create an org in quay

echo "=== Discovering cluster ID ==="
CLUSTER_ID=$(virsh list --name | grep test-infra-cluster | sed 's/test-infra-cluster-//;s/-master-0//' | head -1)
[[ -z "${CLUSTER_ID}" ]] && echo "ERROR: No running test-infra cluster found" && exit 1
echo "Found cluster ID: ${CLUSTER_ID}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This cluster ID discovery depends on the assisted-installer VM naming convention (test-infra-cluster-*-master-0). Just curious, is this pattern stable? Can it change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is constant set by assisted-test-infra

@omer-vishlitzky
Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-osac-project-osac-test-infra-main-snapshot-vmaas

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@omer-vishlitzky: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

Add cluster-tool snapshot step and workflow that provisions a
baremetal cluster via assisted-installer, installs OSAC, snapshots
the cluster, and pushes the OCI image to quay.io. Runs nightly
at 2am UTC.
Pass the same SNO cluster specs (CNV, LVM, 56GB RAM, 2x200GB disks,
24 vCPUs) that the existing periodic E2E jobs use. Without this,
assisted-installer would provision a cluster with wrong defaults.
@omer-vishlitzky omer-vishlitzky force-pushed the osac-854-snapshot-nightly branch from 9f38a20 to e0abae1 Compare May 19, 2026 09:16
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@omer-vishlitzky: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
periodic-ci-osac-project-osac-test-infra-main-snapshot-vmaas N/A periodic Periodic changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@omer-vishlitzky
Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-osac-project-osac-test-infra-main-snapshot-vmaas

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@omer-vishlitzky: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@omer-vishlitzky, pj-rehearse: unable prepare a candidate for rehearsal; rehearsals will not be run. This could be due to a branch that needs to be rebased. ERROR:

couldn't checkout base SHA 33fc8896174769dbf725b71d50319e6096b9964b: error checking out "33fc8896174769dbf725b71d50319e6096b9964b": exit status 128 fatal: unable to read tree (33fc8896174769dbf725b71d50319e6096b9964b)

@omer-vishlitzky
Copy link
Copy Markdown
Contributor Author

/unhold

@omer-vishlitzky
Copy link
Copy Markdown
Contributor Author

/pj-rehearse ack

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@omer-vishlitzky: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-merge-bot openshift-merge-bot Bot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label May 19, 2026
@omer-vishlitzky
Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-osac-project-osac-test-infra-main-snapshot-vmaas

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@omer-vishlitzky: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@omer-vishlitzky
Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-osac-project-osac-test-infra-main-snapshot-vmaas

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@omer-vishlitzky: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 19, 2026

@omer-vishlitzky: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/periodic-ci-osac-project-osac-test-infra-main-snapshot-vmaas e0abae1 link unknown /pj-rehearse periodic-ci-osac-project-osac-test-infra-main-snapshot-vmaas

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@omer-vishlitzky
Copy link
Copy Markdown
Contributor Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. rehearsals-ack Signifies that rehearsal jobs have been acknowledged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants