disk: get disk quantity and identify LingJun node from metadata by huww98 · Pull Request #1605 · kubernetes-sigs/alibaba-cloud-csi-driver

huww98 · 2026-01-02T18:10:18Z

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Currently, the InstanceType query logic is bound to the volume count logic, primarily because we need both Node.status.volumesAttached (for currently managed disks) and node.annotation (for DiskQuantity) from the same get node response, and the status must be fetched in the middle of the volume count logic. This makes it hard to extract more info (e.g. nvmeSupport) from the fetched metadata apart from DiskQuantity.

So I moved DiskQuantity logic to cloud/metadata, it fits well because we have 3 different places to get this info.
To implement this, two major refactor is done in cloud/metadata:

Support non-string value. In-package component now all returns any for the value, but we still keep it type-safe for all public API. This allows us to add 2 new non-string metadata: DiskQuantity (int32) and MachineKind (enum: ECS/LingJun).
Session support. Introduce a new m.WithSession(ctx) API to inject a context and allow retry of previously failed fetchers. So that we can keep retry as the NodeGetInfo CSI GRPC call retries.

Then, two new fetchers for ECS DescribeInstanceTypes and EFLO DescribeNodeType is added. K8s fetcher is extended to parse the annotation.

The disk driver is changed to use the added metadata fields.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

/hold
based on #1599, merge it first

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

huww98 · 2026-01-08T01:22:23Z

/unhold

huww98 · 2026-01-09T09:02:09Z

Log on ECS

I0108 18:29:33.229251    9131 metadata.go:248] "retrieved metadata" provider="IMDS" key="RegionID" value="cn-beijing"
I0108 18:29:33.242843    9131 metadata.go:248] "retrieved metadata" provider="IMDS" key="InstanceID" value="i-2zedlg2qt0av5phg18uq"
E0108 18:29:33.265988    9131 nodeserver.go:206] get vmoc failed: unknown metadata key
I0108 18:29:33.834886    9131 metadata.go:248] "retrieved metadata" method="/csi.v1.Node/NodeGetInfo" provider="IMDS" key="MachineKind" value=1
I0108 18:29:33.834908    9131 eflo.go:89] "skip EFLO metadata fetcher" method="/csi.v1.Node/NodeGetInfo" machineKind=1
I0108 18:29:33.834913    9131 metadata.go:248] "retrieved metadata" method="/csi.v1.Node/NodeGetInfo" provider="IMDS" key="InstanceType" value="ecs.g8y.xlarge"
I0108 18:29:33.872589    9131 metadata.go:248] "retrieved metadata" method="/csi.v1.Node/NodeGetInfo" provider="ECS_Instance_Type" key="DiskQuantity" value=16
I0108 18:29:33.973957    9131 metadata.go:248] "retrieved metadata" method="/csi.v1.Node/NodeGetInfo" provider="IMDS" key="ZoneID" value="cn-beijing-i"

Log on LingJun

I0109 08:55:19.682164 3555667 metadata.go:248] "retrieved metadata" provider="env" key="RegionID" value="cn-wulanchabu"
I0109 08:55:19.759666 3555667 metadata.go:248] "retrieved metadata" provider="lingjun" key="InstanceID" value="e01-cn-zqb46i0iv7y"
E0109 08:55:19.872533 3555667 nodeserver.go:206] get vmoc failed: unknown metadata key
I0109 08:55:21.202312 3555667 metadata.go:248] "retrieved metadata" method="/csi.v1.Node/NodeGetInfo" provider="lingjun" key="MachineKind" value=2
I0109 08:55:21.383911 3555667 metadata.go:248] "retrieved metadata" method="/csi.v1.Node/NodeGetInfo" provider="EFLO" key="DiskQuantity" value=0
I0109 08:55:21.424987 3555667 metadata.go:248] "retrieved metadata" method="/csi.v1.Node/NodeGetInfo" provider="lingjun" key="ZoneID" value="cn-wulanchabu-c"

When /etc/eflo_config/lingjun_config is not found (works now!)

I0109 08:58:47.358973 3609408 metadata.go:248] "retrieved metadata" provider="env" key="RegionID" value="cn-wulanchabu"
I0109 08:58:47.416114 3609408 metadata.go:248] "retrieved metadata" provider="IMDS" key="InstanceID" value="e01-cn-zqb46i0iv7y"
E0109 08:58:47.480204 3609408 nodeserver.go:206] get vmoc failed: unknown metadata key
I0109 08:59:29.345592 3609408 metadata.go:248] "retrieved metadata" method="/csi.v1.Node/NodeGetInfo" provider="Kubernetes" key="MachineKind" value=2
I0109 08:59:29.564357 3609408 metadata.go:248] "retrieved metadata" method="/csi.v1.Node/NodeGetInfo" provider="EFLO" key="DiskQuantity" value=0
I0109 08:59:29.596003 3609408 metadata.go:248] "retrieved metadata" method="/csi.v1.Node/NodeGetInfo" provider="IMDS" key="ZoneID" value="cn-wulanchabu-c"

k8s-ci-robot · 2026-01-29T13:39:47Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: huww98
Once this PR has been reviewed and has the lgtm label, please assign huww98, mowangdk for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS [huww98]

Need more approvers for rest parts.

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

huww98 · 2026-02-05T12:14:23Z

/hold
merge #1614 first

huww98 · 2026-02-06T09:41:38Z

/unhold

mowangdk · 2026-05-03T12:51:39Z

Please resolve the conflict

mowangdk

A Lingjun node type e2e test is still waiting to be merged. Need to merge this after that one.

mowangdk · 2026-05-04T13:51:30Z

+func (f *OpenAPIFetcher) FetchFor(ctx *mcontext, key MetadataKey) (middleware, error) {
 	switch key {
-	case InstanceID, ZoneID, InstanceType, AccountID:
+	case InstanceID, ZoneID, InstanceType:


Where is AccountID?

Moved to sts.go

mowangdk · 2026-05-04T13:54:22Z


-	instanceId, err := f.mPre.Get(InstanceID)
+	kind, err := f.mPre.GetAny(ctx, machineKind)
+	if err == nil && kind != MachineKindECS { // skip for non-ECS instances


If we support metadata for all Lingjun instance in future, we’ll need a more specific type.

I'm not sure I understand this. But LingJun OpenAPI is moved to eflo.go, IMDS support is still in imds.go which works as you can see in #1605 (comment)

mowangdk · 2026-05-04T13:56:18Z

 	return v, nil
 }

-func newImmutableProvider(provider MetadataProvider, name string) *immutableProvider {


Please add comments to these providers. I forgot why this one is needed, and I’m not sure what ‘immutable’ means here.

// immutable fetches metadata from next only once and caches the result. // Print a log with name, key, and value when metadata is retrieved

Comment added

mowangdk · 2026-05-04T13:59:50Z

 	}

+	unmanaged := 0
 	for _, disk := range attachedDisks {


Do we have any e2e tests for disk availability? Please add more tests here.

We have unit test for this function. And we have external-storage e2e that assert the disk limit isn't too high, it will try to attach as many disks as reported by plugin and ensure all disks can be attached and used by pods.

Allow us to integrate non-string metadata.

Returns LingJun if: - /etc/eflo_config/lingjun_config exists - Node has label alibabacloud.com/lingjun-worker Returns ECS if InstanceType has "ecs." prefix.

IMDS stands for Instance Metadata Service. Now it can also be accessed from LingJun instances

Introduce a new `m.WithSession(ctx)` API to inject a context and allow retry of previously failed fetchers. The errors are moved to the Metadata type from lazyInit, so all the errors can be replaced at once. A slot is reserved for each type of fetcher, assuming each type is used only once in the hierarchy. A *mcontext argument is passed along to every fetcher and middleware, with ctx from session and logger extracted from context. New inMemory mode is introduced to minimize network requests. For example, if we have fetcher A failed but B succeeded, then in the new session, the error from A is cleared, but we should still use data from B because it is already present in memory.

Use a real json copied from LingJun instance.

We should handle the case when server returned incorrect or multiple items.

k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Jan 2, 2026

k8s-ci-robot requested review from iltyty and mowangdk January 2, 2026 18:10

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 2, 2026

huww98 force-pushed the metadata-eflo branch from b2dc505 to 66c95a5 Compare January 3, 2026 04:15

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 7, 2026

huww98 force-pushed the metadata-eflo branch from 66c95a5 to 1f825d4 Compare January 8, 2026 01:21

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 8, 2026

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 8, 2026

huww98 force-pushed the metadata-eflo branch from 1f825d4 to 3d467fd Compare January 9, 2026 04:00

huww98 force-pushed the metadata-eflo branch from 8d4c807 to aa9d968 Compare January 9, 2026 13:37

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 12, 2026

huww98 force-pushed the metadata-eflo branch from aa9d968 to 53ff04f Compare January 29, 2026 13:33

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 29, 2026

huww98 force-pushed the metadata-eflo branch from 53ff04f to 47a84eb Compare January 29, 2026 13:39

huww98 force-pushed the metadata-eflo branch from 47a84eb to c4109cc Compare January 29, 2026 14:09

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 5, 2026

huww98 force-pushed the metadata-eflo branch 2 times, most recently from 5d75c08 to ba18e54 Compare February 6, 2026 09:36

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 6, 2026

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 27, 2026

huww98 force-pushed the metadata-eflo branch from ba18e54 to b37847b Compare February 27, 2026 01:47

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 27, 2026

huww98 force-pushed the metadata-eflo branch from b37847b to 9b59e7d Compare March 13, 2026 10:18

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 3, 2026

mowangdk reviewed May 4, 2026

View reviewed changes

huww98 force-pushed the metadata-eflo branch from 9b59e7d to 418d756 Compare May 5, 2026 06:19

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 5, 2026

huww98 force-pushed the metadata-eflo branch 2 times, most recently from 8aa08f9 to 39d9fd0 Compare May 5, 2026 09:00

huww98 mentioned this pull request May 6, 2026

labeler: add centralized metadata labeler #1685

Open

huww98 added 10 commits May 15, 2026 17:05

metadata: change internal middleware to return any

837dced

Allow us to integrate non-string metadata.

metadata: add machineKind to identify LingJun nodes

79ca0d7

Returns LingJun if: - /etc/eflo_config/lingjun_config exists - Node has label alibabacloud.com/lingjun-worker Returns ECS if InstanceType has "ecs." prefix.

disk: use metadata to identify LingJun nodes

1f3c070

metadata: rename ECS to IMDS

7d81e35

IMDS stands for Instance Metadata Service. Now it can also be accessed from LingJun instances

metadata: add DiskQuantity

d701514

metadata: skip fetcher on unmatch machine kind

f9baf71

disk: get disk quantity from metadata

78df3c6

metadata: simplify lingjun tests

af3ad9a

Use a real json copied from LingJun instance.

metadata: compare returned instanceID / InstanceType

5be3a49

We should handle the case when server returned incorrect or multiple items.

huww98 force-pushed the metadata-eflo branch from 39d9fd0 to 5be3a49 Compare May 15, 2026 09:08

Conversation

huww98 commented Jan 2, 2026

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

huww98 commented Jan 8, 2026

Uh oh!

huww98 commented Jan 9, 2026

Uh oh!

k8s-ci-robot commented Jan 29, 2026

Uh oh!

huww98 commented Feb 5, 2026

Uh oh!

huww98 commented Feb 6, 2026

Uh oh!

mowangdk commented May 3, 2026

Uh oh!

mowangdk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants