Skip to content

test/e2e: make collectNodes log dump best-effort#6265

Merged
k8s-ci-robot merged 2 commits into
kubernetes-sigs:mainfrom
mboersma:fix-collectnodes-best-effort
May 2, 2026
Merged

test/e2e: make collectNodes log dump best-effort#6265
k8s-ci-robot merged 2 commits into
kubernetes-sigs:mainfrom
mboersma:fix-collectnodes-best-effort

Conversation

@mboersma
Copy link
Copy Markdown
Contributor

@mboersma mboersma commented May 1, 2026

What type of PR is this?

/kind bug

What this PR does / why we need it:

The collectNodes helper in test/e2e/azure_clusterproxy.go runs from [AfterEach] to dump per-node logs and kubectl describe output for the workload cluster. It currently uses Expect(...).To(Succeed()) when listing nodes, which turns any transient inability to reach the workload cluster API server into a hard spec failure during teardown.

In practice the workload cluster's Azure load balancer / API server is sometimes briefly unreachable while the spec is being torn down. Recent runs of the pull-cluster-api-provider-azure-apiversion-upgrade presubmit (and the corresponding periodic) show otherwise-successful specs failing in [AfterEach] at azure_clusterproxy.go:193 with errors like:

dial tcp <ip>:6443: i/o timeout
failed to get server groups: ...

This change converts that single Expect(...) into a logged warning + early return, matching the pattern already used a few lines above for streaming pod logs ("Failing to stream logs should not cause the test to fail").

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

  • Surgical change: only collectNodes is touched. The rest of the cleanup helpers already log-and-continue or are protected by similar best-effort patterns.
  • Testgrid context (capz-pr-apiversion-upgrade-main and capz-periodic-apiversion-upgrade-main) showed the same [AfterEach] failure signature across multiple recent runs while the actual [It] block had passed.

Release note:

NONE

Additional documentation:


The collectNodes helper runs from [AfterEach] to dump per-node logs and
descriptions for the workload cluster. It currently uses
Expect(...).To(Succeed()) when listing nodes, which turns any transient
inability to reach the workload cluster API server into a hard spec
failure during teardown.

In practice the workload cluster's Azure load balancer / API server is
sometimes briefly unreachable while the spec is being torn down, which
has been causing otherwise-successful runs of the apiversion-upgrade job
to fail in [AfterEach] with i/o timeout against
*.cloudapp.azure.com:6443.

Match the pattern already used a few lines above for streaming pod logs:
log the error and continue instead of failing the spec.

Signed-off-by: Matt Boersma <Matt.Boersma@microsoft.com>
@k8s-ci-robot k8s-ci-robot added the release-note-none Denotes a PR that doesn't merit a release note. label May 1, 2026
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 1, 2026
@k8s-ci-robot k8s-ci-robot requested review from bryan-cox and nojnhuh May 1, 2026 17:11
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 1, 2026
@mboersma
Copy link
Copy Markdown
Contributor Author

mboersma commented May 1, 2026

/test pull-cluster-api-provider-azure-apiversion-upgrade

@codecov
Copy link
Copy Markdown

codecov Bot commented May 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 43.88%. Comparing base (9652dbc) to head (4a8e9f1).
⚠️ Report is 402 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6265      +/-   ##
==========================================
- Coverage   46.94%   43.88%   -3.07%     
==========================================
  Files         279      289      +10     
  Lines       29688    25351    -4337     
==========================================
- Hits        13937    11125    -2812     
+ Misses      14938    13448    -1490     
+ Partials      813      778      -35     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@jackfrancis jackfrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 1, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 1, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: 27c448e222a395f4bf55bdbff9902315654748a1

@mboersma
Copy link
Copy Markdown
Contributor Author

mboersma commented May 1, 2026

/retest

1 similar comment
@mboersma
Copy link
Copy Markdown
Contributor Author

mboersma commented May 1, 2026

/retest

@mboersma
Copy link
Copy Markdown
Contributor Author

mboersma commented May 1, 2026

/cherry-pick release-1.23

@k8s-infra-cherrypick-robot
Copy link
Copy Markdown

@mboersma: once the present PR merges, I will cherry-pick it on top of release-1.23 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-1.23

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mboersma
Copy link
Copy Markdown
Contributor Author

mboersma commented May 1, 2026

/cherry-pick release-1.22

@k8s-infra-cherrypick-robot
Copy link
Copy Markdown

@mboersma: once the present PR merges, I will cherry-pick it on top of release-1.22 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-1.22

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot merged commit e845eba into kubernetes-sigs:main May 2, 2026
24 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.24 milestone May 2, 2026
@github-project-automation github-project-automation Bot moved this from Todo to Done in CAPZ Planning May 2, 2026
@k8s-infra-cherrypick-robot
Copy link
Copy Markdown

@mboersma: new pull request created: #6266

Details

In response to this:

/cherry-pick release-1.23

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-infra-cherrypick-robot
Copy link
Copy Markdown

@mboersma: new pull request created: #6267

Details

In response to this:

/cherry-pick release-1.22

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants