Register Windows nodes with uninitialized cloud taint#6293
Conversation
|
/cherry-pick release-1.24 |
|
@mboersma: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/cherry-pick release-1.23 |
|
@mboersma: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #6293 +/- ##
==========================================
- Coverage 43.84% 43.84% -0.01%
==========================================
Files 289 289
Lines 25346 25346
==========================================
- Hits 11114 11112 -2
- Misses 13458 13460 +2
Partials 774 774 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/test pull-cluster-api-provider-azure-capi-e2e |
|
/cc @Liunardy |
|
@mboersma: GitHub didn't allow me to request PR reviews from the following users: Liunardy. Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test pull-cluster-api-provider-azure-conformance-ipv6-with-ci-artifacts |
|
/test pull-cluster-api-provider-azure-conformance-with-ci-artifacts-dra |
|
@mboersma: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
LGTM label has been added. DetailsGit tree hash: 303eed7a40232e797cf30cfa135e751b05f37c7b |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jackfrancis The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@mboersma: new pull request created: #6294 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@mboersma: new pull request created: #6295 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Adds
register-with-taints: node.cloudprovider.kubernetes.io/uninitialized=true:NoScheduleto the Windows workerkubeletExtraArgsin every CAPZ cluster template (and flavor source) that hascloud-provider: externalfor Windows.Why
The
cloud-provider-azure-ccm-windows-capzprow job has been failing because Windows nodes register without thenode.cloudprovider.kubernetes.io/uninitializedtaint. Without that taint, thecloud-node-managerdaemonset on the workload cluster logsNode has no cloud taint, skipping initialization, theproviderIDnever gets set on the Node, andMachineDeploymentreadiness never converges.Linux nodes in the same run get the taint applied (the Linux CNM log shows
Initializing node with cloud provider -> Successfully initialized node with cloud provider), so something about howcloud-provider: externalgets translated throughkubeadm join→KubeletConfigurationstrategic-merge patch → Windows kubelet 1.37-alpha is no longer producing the taint on Windows. The root cause is somewhere in the kubeadm/kubelet/Windows stack, and there's an upstream issue to file with the per-nodekubeadm-flags.envandconfig.yaml.In the meantime, explicitly registering the taint via
kubeletExtraArgs.register-with-taintsmakes the behavior independent of any cloud-provider flag translation. The cloud-node-manager will clear the taint after initializing the node, and the rest of the flow proceeds as before. Linux is unaffected (it already gets the same taint via its own path; this just makes it explicit on Windows).Which issue(s) this PR fixes:
Refs the failing job: https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/cloud-provider-azure-ccm-windows-capz/2053505513055850496
See also kubernetes-sigs/image-builder#2009
Special notes for your reviewer:
templates/flavors/windows/,templates/flavors/windows-apiserver-ilb/,templates/flavors/machinepool-windows/,templates/test/ci/prow-clusterclass-ci-default/) and ranmake generate-flavors; the rest of the diff is regenerated output.TODOs:
Release note: