test/e2e: make collectNodes log dump best-effort#6265
Conversation
The collectNodes helper runs from [AfterEach] to dump per-node logs and descriptions for the workload cluster. It currently uses Expect(...).To(Succeed()) when listing nodes, which turns any transient inability to reach the workload cluster API server into a hard spec failure during teardown. In practice the workload cluster's Azure load balancer / API server is sometimes briefly unreachable while the spec is being torn down, which has been causing otherwise-successful runs of the apiversion-upgrade job to fail in [AfterEach] with i/o timeout against *.cloudapp.azure.com:6443. Match the pattern already used a few lines above for streaming pod logs: log the error and continue instead of failing the spec. Signed-off-by: Matt Boersma <Matt.Boersma@microsoft.com>
|
/test pull-cluster-api-provider-azure-apiversion-upgrade |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #6265 +/- ##
==========================================
- Coverage 46.94% 43.88% -3.07%
==========================================
Files 279 289 +10
Lines 29688 25351 -4337
==========================================
- Hits 13937 11125 -2812
+ Misses 14938 13448 -1490
+ Partials 813 778 -35 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jackfrancis The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
LGTM label has been added. DetailsGit tree hash: 27c448e222a395f4bf55bdbff9902315654748a1 |
|
/retest |
1 similar comment
|
/retest |
|
/cherry-pick release-1.23 |
|
@mboersma: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/cherry-pick release-1.22 |
|
@mboersma: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@mboersma: new pull request created: #6266 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@mboersma: new pull request created: #6267 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
The
collectNodeshelper intest/e2e/azure_clusterproxy.goruns from[AfterEach]to dump per-node logs andkubectl describeoutput for the workload cluster. It currently usesExpect(...).To(Succeed())when listing nodes, which turns any transient inability to reach the workload cluster API server into a hard spec failure during teardown.In practice the workload cluster's Azure load balancer / API server is sometimes briefly unreachable while the spec is being torn down. Recent runs of the
pull-cluster-api-provider-azure-apiversion-upgradepresubmit (and the corresponding periodic) show otherwise-successful specs failing in[AfterEach]atazure_clusterproxy.go:193with errors like:This change converts that single
Expect(...)into a logged warning + early return, matching the pattern already used a few lines above for streaming pod logs ("Failing to stream logs should not cause the test to fail").Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
collectNodesis touched. The rest of the cleanup helpers already log-and-continue or are protected by similar best-effort patterns.capz-pr-apiversion-upgrade-mainandcapz-periodic-apiversion-upgrade-main) showed the same[AfterEach]failure signature across multiple recent runs while the actual[It]block had passed.Release note:
Additional documentation: