You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #10194 fixed standalone VM classification for providerID and ipConfigurationID, but getVMManagementTypeByNodeName still has the old unconditional short-circuit:
That means a standalone VM node can still be misclassified as VMSS uniform on nodeName-based paths when:
vmType=vmss
DisableAvailabilitySetNodes=true
EnableVmssFlexNodes=false
The remaining affected callers include at least:
GetPowerStatusByNodeName
GetProvisioningStateByNodeName
GetInstanceTypeByNodeName
GetZoneByNodeName
EnsureHostsInPool
GetNodeVMSetName
So after #10194, some standalone-VM flows are fixed, but nodeName-based flows can still take the VMSS handler incorrectly.
What you expected to happen:
Standalone VM nodes should not be classified as ManagedByVmssUniform just because DisableAvailabilitySetNodes=true.
NodeName-based classification should be consistent with the providerID/ipConfigurationID fixes from #10194, so standalone VMs route to the availability-set/standalone-VM handler instead of the VMSS-uniform handler.
How to reproduce it (as minimally and precisely as possible):
Configure Azure CCM with:
vmType=vmss
DisableAvailabilitySetNodes=true
EnableVmssFlexNodes=false
Have at least one real standalone VM node in the cluster (providerID format: /providers/Microsoft.Compute/virtualMachines/..., not a VMSS instance).
Observe that getVMManagementTypeByNodeName returns ManagedByVmssUniform before consulting any non-VMSS cache/lookup, so the request is routed down the VMSS-uniform path.
I validated live that providerID-based node lifecycle paths are active in CCM (GetNodeNameByProviderID from node_lifecycle_controller), and then checked the source to find the remaining unconditional nodeName short-circuit.
On the live clusters I used for comparison, DisableAvailabilitySetNodes was not enabled, so this remaining bug was dormant there. That is why I did not see a before/after behavior change from the master image on those clusters.
Environment:
Kubernetes version (use kubectl version):
Affected by source on current master; live comparison was done on AKS/HCP clusters running custom CCM images.
Cloud provider or hardware configuration:
Azure
vmType=vmss
standalone VM nodes present
bug requires DisableAvailabilitySetNodes=true and EnableVmssFlexNodes=false
OS (e.g: cat /etc/os-release):
Linux nodes
Kernel (e.g. uname -a):
N/A
Install tools:
CCM in managed AKS/HCP-style deployment
Network plugin and version (if this is a network-related bug):
What happened:
PR #10194 fixed standalone VM classification for
providerIDandipConfigurationID, butgetVMManagementTypeByNodeNamestill has the old unconditional short-circuit:That means a standalone VM node can still be misclassified as VMSS uniform on nodeName-based paths when:
vmType=vmssDisableAvailabilitySetNodes=trueEnableVmssFlexNodes=falseThe remaining affected callers include at least:
GetPowerStatusByNodeNameGetProvisioningStateByNodeNameGetInstanceTypeByNodeNameGetZoneByNodeNameEnsureHostsInPoolGetNodeVMSetNameSo after #10194, some standalone-VM flows are fixed, but nodeName-based flows can still take the VMSS handler incorrectly.
What you expected to happen:
Standalone VM nodes should not be classified as
ManagedByVmssUniformjust becauseDisableAvailabilitySetNodes=true.NodeName-based classification should be consistent with the providerID/ipConfigurationID fixes from #10194, so standalone VMs route to the availability-set/standalone-VM handler instead of the VMSS-uniform handler.
How to reproduce it (as minimally and precisely as possible):
vmType=vmssDisableAvailabilitySetNodes=trueEnableVmssFlexNodes=false/providers/Microsoft.Compute/virtualMachines/..., not a VMSS instance).GetPowerStatusByNodeName/GetProvisioningStateByNodeNameGetInstanceTypeByNodeNameGetZoneByNodeNameEnsureHostsInPoolGetNodeVMSetNamegetVMManagementTypeByNodeNamereturnsManagedByVmssUniformbefore consulting any non-VMSS cache/lookup, so the request is routed down the VMSS-uniform path.Code pointers on current master:
pkg/provider/azure_vmss_cache.go:getVMManagementTypeByNodeNamepkg/provider/azure_vmss_cache.go:getVMManagementTypeByProviderIDpkg/provider/azure_vmss_cache.go:getVMManagementTypeByIPConfigurationIDAnything else we need to know?:
GetNodeNameByProviderIDfromnode_lifecycle_controller), and then checked the source to find the remaining unconditional nodeName short-circuit.DisableAvailabilitySetNodeswas not enabled, so this remaining bug was dormant there. That is why I did not see a before/after behavior change from the master image on those clusters.Environment:
kubectl version):vmType=vmssDisableAvailabilitySetNodes=trueandEnableVmssFlexNodes=falsecat /etc/os-release):uname -a):