Skip to content

test/e2e: make collectPodLogs log dump best-effort#6315

Open
mboersma wants to merge 1 commit into
kubernetes-sigs:mainfrom
mboersma:fix-collectpodlogs-best-effort
Open

test/e2e: make collectPodLogs log dump best-effort#6315
mboersma wants to merge 1 commit into
kubernetes-sigs:mainfrom
mboersma:fix-collectpodlogs-best-effort

Conversation

@mboersma
Copy link
Copy Markdown
Contributor

@mboersma mboersma commented May 17, 2026

What type of PR is this?

/kind flake

What this PR does / why we need it:

Mirrors the fix from #6265 (commit 4a8e9f1) for the sibling collectPodLogs helper. collectPodLogs runs from [AfterEach] to dump pod descriptions and container logs for the workload cluster, and currently uses Expect(...).To(Succeed()) when listing pods. That turns any transient unreachability of the workload cluster API server into a hard spec failure during teardown.

After #6265, collectNodes returns gracefully on i/o timeout, but collectPodLogs is the very next call in the AfterEach log-collection path and hits the same transient *.cloudapp.azure.com:6443: i/o timeout against the same load balancer. The result is that otherwise-successful runs of the apiversion-upgrade-v1beta1 job fail in [AfterEach] at test/e2e/azure_clusterproxy.go:112.

Recent example: 5 of 6 consecutive failures on the cherry-pick PR #6314 show STEP: PASSED! followed by [FAILED] in [AfterEach] - .../azure_clusterproxy.go:112, with the same i/o timeout signature against the management cluster LB in canadacentral.

Match the pattern already used a few lines below for streaming container logs (and now used in collectNodes): log the error and continue instead of failing the spec.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

This should be cherry-picked to release-1.24 so it can unblock #6314.

/cherry-pick release-1.24

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests
  • cherry-pick candidate

Release note:

test/e2e: make collectPodLogs log dump best-effort

The collectPodLogs helper runs from [AfterEach] to dump pod descriptions
and container logs for the workload cluster. It currently uses
Expect(...).To(Succeed()) when listing pods, which turns any transient
inability to reach the workload cluster API server into a hard spec
failure during teardown.

This is the same bug pattern that was fixed for the sibling collectNodes
helper in 4a8e9f1 (kubernetes-sigs#6265). With collectNodes now returning gracefully
on i/o timeout, collectPodLogs is the next call in the AfterEach log
collection path and hits the same transient unreachability against
*.cloudapp.azure.com:6443, causing otherwise-successful runs of the
apiversion-upgrade job to fail in [AfterEach].

Match the pattern already used a few lines below for streaming container
logs (and now used in collectNodes): log the error and continue instead
of failing the spec.

Signed-off-by: Matt Boersma <Matt.Boersma@microsoft.com>
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/flake Categorizes issue or PR as related to a flaky test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 17, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign nojnhuh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 17, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 43.84%. Comparing base (e8b9ce3) to head (9ab0c7a).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6315   +/-   ##
=======================================
  Coverage   43.84%   43.84%           
=======================================
  Files         289      289           
  Lines       25346    25346           
=======================================
  Hits        11114    11114           
  Misses      13458    13458           
  Partials      774      774           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/flake Categorizes issue or PR as related to a flaky test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

2 participants