Skip to content

CSI Node pods in crashloop after the datacenter rename in vcenter #3888

@ksanghavi

Description

@ksanghavi

/kind bug

What happened:

When a Datacenter is renamed in vCenter, the syncer container fails to discover/register nodes with errors like datacenter '/path' not found, even though the ConfigMap contains a MoRef (for instance, Datacenter:datacenter-3) which should be resilient to name changes.

Syncer logs the following

{"level":"error","time":"2026-02-04T23:41:12.178227753Z","caller":"node/manager.go:236","msg":"failed to discover node with nodeUUID 422b45e7-2ccb-ab2d-c7a1-67e519460bb3 with err: failed to fetch datacenters for vc lvn-dvm-10-161-45-1.dvm.lvn.broadcom.net with err: datacenter '/vmspDC2' not found","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*defaultManager).GetNodeVMAndUpdateCache\n\t/build/mts/release/bora-25127461/cayman_vsphere_csi_driver/vsphere_csi_driver/src/pkg/common/cns-lib/node/manager.go:236\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/syncer/cnsoperator/controller/csinodetopology.(*ReconcileCSINodeTopology).reconcileForVanilla\n\t/build/mts/release/bora-25127461/cayman_vsphere_csi_driver/vsphere_csi_driver/src/pkg/syncer/cnsoperator/controller/csinodetopology/csinodetopology_controller.go:275\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/syncer/cnsoperator/controller/csinodetopology.(*ReconcileCSINodeTopology).Reconcile\n\t/build/mts/release/bora-25127461/cayman_vsphere_csi_driver/vsphere_csi_driver/src/pkg/syncer/cnsoperator/controller/csinodetopology/csinodetopology_controller.go:224\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/build/mts/release/bora-25127461/cayman_vsphere_csi_driver/vsphere_csi_driver/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.1/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/build/mts/release/bora-25127461/cayman_vsphere_csi_driver/vsphere_csi_driver/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.1/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/build/mts/release/bora-25127461/cayman_vsphere_csi_driver/vsphere_csi_driver/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.1/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/build/mts/release/bora-25127461/cayman_vsphere_csi_driver/vsphere_csi_driver/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.1/pkg/internal/controller/controller.go:224"}
{"level":"error","time":"2026-02-04T23:41:12.178409663Z","caller":"csinodetopology/csinodetopology_controller.go:290","msg":"failed to retrieve nodeVM \"422b45e7-2ccb-ab2d-c7a1-67e519460bb3\" using the node manager. Error: failed to fetch datacenters for vc lvn-dvm-10-161-45-1.dvm.lvn.broadcom.net with err: datacenter '/vmspDC2' not found","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/syncer/cnsoperator/controller/csinodetopology.(*ReconcileCSINodeTopology).reconcileForVanilla\n\t/build/mts/release/bora-25127461/cayman_vsphere_csi_driver/vsphere_csi_driver/src/pkg/syncer/cnsoperator/controller/csinodetopology/csinodetopology_controller.go:290\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/syncer/cnsoperator/controller/csinodetopology.(*ReconcileCSINodeTopology).Reconcile\n\t/build/mts/release/bora-25127461/cayman_vsphere_csi_driver/vsphere_csi_driver/src/pkg/syncer/cnsoperator/controller/csinodetopology/csinodetopology_controller.go:224\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/build/mts/release/bora-25127461/cayman_vsphere_csi_driver/vsphere_csi_driver/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.1/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/build/mts/release/bora-25127461/cayman_vsphere_csi_driver/vsphere_csi_driver/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.1/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/build/mts/release/bora-25127461/cayman_vsphere_csi_driver/vsphere_csi_driver/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.1/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/build/mts/release/bora-25127461/cayman_vsphere_csi_driver/vsphere_csi_driver/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.1/pkg/internal/controller/controller.go:224"}

Perform node rollout and the csi-node pods for the new node enters crashloop. The node-driver-registrar logs

I0204 23:43:45.351595       1 main.go:96] "Received GetInfo call" request="&InfoRequest{}"
I0204 23:43:45.496771       1 main.go:108] "Received NotifyRegistrationStatus call" status="&RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to retrieve topology information for Node: \"vmsp-0203-default-xl6v6-g7nh4-hsw85\". Error: \"failed to retrieve nodeVM \\\"422b45e7-2ccb-ab2d-c7a1-67e519460bb3\\\" using the node manager. Error: failed to fetch datacenters for vc lvn-dvm-10-161-45-1.dvm.lvn.broadcom.net with err: datacenter '/vmspDC2' not found\",}"
E0204 23:43:45.496861       1 main.go:110] "Registration process failed with error, restarting registration container" err="RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to retrieve topology information for Node: \"vmsp-0203-default-xl6v6-g7nh4-hsw85\". Error: \"failed to retrieve nodeVM \\\"422b45e7-2ccb-ab2d-c7a1-67e519460bb3\\\" using the node manager. Error: failed to fetch datacenters for vc lvn-dvm-10-161-45-1.dvm.lvn.broadcom.net with err: datacenter '/vmspDC2' not found\""

The ConfigMap contains the mored ids.

apiVersion: v1
data:
  vsphere.conf: |
    # Global properties in this section will be used for all specified vCenters unless overriden in VirtualCenter section.
    global:
      port: ..
      thumbprint: ..
      # settings for using k8s secret
      secretName: vsphere-cloud-secret
      secretNamespace: kube-system
    # vcenter section
    vcenter:
      example.com:
        server: example.com
        datacenters:
          - Datacenter:datacenter-3
    # labels for regions and zones
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: vsphere-cpi
    meta.helm.sh/release-namespace: kube-system  
  labels:
    app: vsphere-cpi
    app.kubernetes.io/managed-by: Helm
    component: cloud-controller-manager
    helm.toolkit.fluxcd.io/name: vsphere-cpi
    helm.toolkit.fluxcd.io/namespace: vmsp-platform
    vsphere-cpi-infra: cloud-config
  name: vsphere-cloud-config
  namespace: kube-system
  
---

apiVersion: v1
data:
  example.com: <password>
  example.com <username>
kind: Secret
metadata:
  creationTimestamp: ".."
  name: vsphere-cloud-secret
  namespace: kube-system
  resourceVersion: ".."  
type: Opaque

How to reproduce it

  1. Use moref ids in the ConfigMap
  2. Rename the datacenter in VC
  3. Perform node rollout.

What you expected to happen

When a Datacenter is renamed in vCenter and the ConfigMap uses MoRef format, the driver should continue to resolve the MoRef to the new path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions