Skip to content

OCPBUGS-64775: use CAPZ to provision ssh rule#10162

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
patrickdillon:OCPBUGS-64775-azure-ssh-leak
Dec 18, 2025
Merged

OCPBUGS-64775: use CAPZ to provision ssh rule#10162
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
patrickdillon:OCPBUGS-64775-azure-ssh-leak

Conversation

@patrickdillon
Copy link
Copy Markdown
Contributor

A change to CAPZ[0], creates an SSH rule if one is not specified in the cluster spec. Prior to this commit, we had been creating the SSH rule with installer SDK hooks, which is still somewhat necessary to add the inbound NAT rules, because we are not yet using CAPZ to provision a public load balancer.

But we can use CAPZ to just create the rule, which will stop CAPZ from preventing a redundant SSH rule which we were leaking during bootstrap destroy.

This change will also result in creating an SSH rule for private clusters which is fine, and something we do on other providers.

0: kubernetes-sigs/cluster-api-provider-azure#5525

@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 15, 2025
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@patrickdillon: This pull request references Jira Issue OCPBUGS-64775, which is invalid:

  • expected the bug to target the "4.22.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

A change to CAPZ[0], creates an SSH rule if one is not specified in the cluster spec. Prior to this commit, we had been creating the SSH rule with installer SDK hooks, which is still somewhat necessary to add the inbound NAT rules, because we are not yet using CAPZ to provision a public load balancer.

But we can use CAPZ to just create the rule, which will stop CAPZ from preventing a redundant SSH rule which we were leaking during bootstrap destroy.

This change will also result in creating an SSH rule for private clusters which is fine, and something we do on other providers.

0: kubernetes-sigs/cluster-api-provider-azure#5525

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested review from jhixson74 and sadasu December 15, 2025 14:42
@patrickdillon
Copy link
Copy Markdown
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 15, 2025
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@patrickdillon: This pull request references Jira Issue OCPBUGS-64775, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jinyunma

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested a review from jinyunma December 15, 2025 14:43
@patrickdillon
Copy link
Copy Markdown
Contributor Author

Linter is failing for unused var will clean it up on the next pass

@patrickdillon
Copy link
Copy Markdown
Contributor Author

we can see the e2e azure job, the bootstrap bundle that the CI step collects was successfully gathered: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_installer/10162/pull-ci-openshift-installer-main-e2e-azure-ovn/2000577150435987456/artifacts/e2e-azure-ovn/ipi-install-install/

This indicates the SSH rule works

A change to CAPZ[0], creates an SSH rule if one is not specified in
the cluster spec. Prior to this commit, we had been creating the
SSH rule with installer SDK hooks, which is still somewhat necessary
to add the inbound NAT rules, because we are not yet using CAPZ
to provision a public load balancer.

But we can use CAPZ to just create the rule, which will stop CAPZ
from preventing a redundant SSH rule which we were leaking during
bootstrap destroy.

This change will also result in creating an SSH rule for private clusters
which is fine, and something we do on other providers.

0: kubernetes-sigs/cluster-api-provider-azure#5525
@patrickdillon patrickdillon force-pushed the OCPBUGS-64775-azure-ssh-leak branch from 0adb754 to 7ce936d Compare December 15, 2025 20:05
@patrickdillon
Copy link
Copy Markdown
Contributor Author

pushed linter fix

Copy link
Copy Markdown
Member

@tthvo tthvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

IIUC, the problem arised when CAPZ checked the cluster spec, saw ssh missing and created a duplicate ssh rule with some name that the installer didn't know about in order to destroy.

With this PR, we can ensure only a single ssh rule is created by CAPZ with a well-known name that the installer already has a logic to destroy 👍 LGTM!

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Dec 15, 2025
Copy link
Copy Markdown
Contributor

@sadasu sadasu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Dec 16, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sadasu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 16, 2025
@tthvo
Copy link
Copy Markdown
Member

tthvo commented Dec 16, 2025

/label acknowledge-critical-fixes-only

@openshift-ci openshift-ci Bot added the acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. label Dec 16, 2025
@tthvo
Copy link
Copy Markdown
Member

tthvo commented Dec 16, 2025

/test e2e-azure-ovn

@patrickdillon
Copy link
Copy Markdown
Contributor Author

/cherry-pick release-4.21

@openshift-cherrypick-robot
Copy link
Copy Markdown

@patrickdillon: once the present PR merges, I will cherry-pick it on top of release-4.21 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-4.21

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@sadasu
Copy link
Copy Markdown
Contributor

sadasu commented Dec 16, 2025

/test azure-ovn-marketplace-images

1 similar comment
@tthvo
Copy link
Copy Markdown
Member

tthvo commented Dec 16, 2025

/test azure-ovn-marketplace-images

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Dec 16, 2025

@patrickdillon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azurestack 7ce936d link false /test e2e-azurestack
ci/prow/okd-scos-e2e-vsphere-ovn 7ce936d link false /test okd-scos-e2e-vsphere-ovn
ci/prow/e2e-azure-ovn-shared-vpc 7ce936d link false /test e2e-azure-ovn-shared-vpc
ci/prow/azure-ovn-marketplace-images 7ce936d link false /test azure-ovn-marketplace-images

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@tthvo
Copy link
Copy Markdown
Member

tthvo commented Dec 16, 2025

/payload-job periodic-ci-openshift-openshift-tests-private-release-4.21-amd64-nightly-azure-ipi-marketplace-mini-perm-f7

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Dec 16, 2025

@tthvo: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-openshift-tests-private-release-4.21-amd64-nightly-azure-ipi-marketplace-mini-perm-f7

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/2b0cfa60-dad7-11f0-872b-bf9fd0884fd5-0

@jinyunma
Copy link
Copy Markdown
Contributor

pre-merge test:

  • For public cluster, I can see that only one ssh rule is created in network security group during installation, and delete later when destroying bootstrap.
  • For private cluster created into an existing vnet, no ssh rule is created in nsg by installer, so no function impact for this scenario, and installation is successful.
  • For private cluster not created in an existing vnet, vnet is provisioned by installer, then ssh rule is created, and no inboundNAT is created, this might cause the issue when destroying bootstrap (related code). Based on slack discussion, this edge scenario rarely happens, we can fix it in z-streams.

/verified by jima

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Dec 18, 2025
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@jinyunma: This PR has been marked as verified by jima.

Details

In response to this:

pre-merge test:

  • For public cluster, I can see that only one ssh rule is created in network security group during installation, and delete later when destroying bootstrap.
  • For private cluster created into an existing vnet, no ssh rule is created in nsg by installer, so no function impact for this scenario, and installation is successful.
  • For private cluster not created in an existing vnet, vnet is provisioned by installer, then ssh rule is created, and no inboundNAT is created, this might cause the issue when destroying bootstrap (related code). Based on slack discussion, this edge scenario rarely happens, we can fix it in z-streams.

/verified by jima

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot openshift-merge-bot Bot merged commit fdf08d7 into openshift:main Dec 18, 2025
17 of 21 checks passed
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@patrickdillon: Jira Issue Verification Checks: Jira Issue OCPBUGS-64775
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-64775 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

Details

In response to this:

A change to CAPZ[0], creates an SSH rule if one is not specified in the cluster spec. Prior to this commit, we had been creating the SSH rule with installer SDK hooks, which is still somewhat necessary to add the inbound NAT rules, because we are not yet using CAPZ to provision a public load balancer.

But we can use CAPZ to just create the rule, which will stop CAPZ from preventing a redundant SSH rule which we were leaking during bootstrap destroy.

This change will also result in creating an SSH rule for private clusters which is fine, and something we do on other providers.

0: kubernetes-sigs/cluster-api-provider-azure#5525

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-cherrypick-robot
Copy link
Copy Markdown

@patrickdillon: new pull request created: #10172

Details

In response to this:

/cherry-pick release-4.21

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-robot
Copy link
Copy Markdown
Contributor

Fix included in accepted release 4.22.0-0.nightly-2025-12-18-234253

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants