disk: get disk quantity and identify LingJun node from metadata#1605
disk: get disk quantity and identify LingJun node from metadata#1605huww98 wants to merge 10 commits into
Conversation
|
/unhold |
|
Log on ECS Log on LingJun When /etc/eflo_config/lingjun_config is not found (works now!) |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: huww98 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/hold |
5d75c08 to
ba18e54
Compare
|
/unhold |
ba18e54 to
b37847b
Compare
|
Please resolve the conflict |
mowangdk
left a comment
There was a problem hiding this comment.
A Lingjun node type e2e test is still waiting to be merged. Need to merge this after that one.
| func (f *OpenAPIFetcher) FetchFor(ctx *mcontext, key MetadataKey) (middleware, error) { | ||
| switch key { | ||
| case InstanceID, ZoneID, InstanceType, AccountID: | ||
| case InstanceID, ZoneID, InstanceType: |
|
|
||
| instanceId, err := f.mPre.Get(InstanceID) | ||
| kind, err := f.mPre.GetAny(ctx, machineKind) | ||
| if err == nil && kind != MachineKindECS { // skip for non-ECS instances |
There was a problem hiding this comment.
If we support metadata for all Lingjun instance in future, we’ll need a more specific type.
There was a problem hiding this comment.
I'm not sure I understand this. But LingJun OpenAPI is moved to eflo.go, IMDS support is still in imds.go which works as you can see in #1605 (comment)
| return v, nil | ||
| } | ||
|
|
||
| func newImmutableProvider(provider MetadataProvider, name string) *immutableProvider { |
There was a problem hiding this comment.
Please add comments to these providers. I forgot why this one is needed, and I’m not sure what ‘immutable’ means here.
There was a problem hiding this comment.
// immutable fetches metadata from next only once and caches the result.
// Print a log with name, key, and value when metadata is retrieved
Comment added
| } | ||
|
|
||
| unmanaged := 0 | ||
| for _, disk := range attachedDisks { |
There was a problem hiding this comment.
Do we have any e2e tests for disk availability? Please add more tests here.
There was a problem hiding this comment.
We have unit test for this function. And we have external-storage e2e that assert the disk limit isn't too high, it will try to attach as many disks as reported by plugin and ensure all disks can be attached and used by pods.
8aa08f9 to
39d9fd0
Compare
Allow us to integrate non-string metadata.
Returns LingJun if: - /etc/eflo_config/lingjun_config exists - Node has label alibabacloud.com/lingjun-worker Returns ECS if InstanceType has "ecs." prefix.
IMDS stands for Instance Metadata Service. Now it can also be accessed from LingJun instances
Introduce a new `m.WithSession(ctx)` API to inject a context and allow retry of previously failed fetchers. The errors are moved to the Metadata type from lazyInit, so all the errors can be replaced at once. A slot is reserved for each type of fetcher, assuming each type is used only once in the hierarchy. A *mcontext argument is passed along to every fetcher and middleware, with ctx from session and logger extracted from context. New inMemory mode is introduced to minimize network requests. For example, if we have fetcher A failed but B succeeded, then in the new session, the error from A is cleared, but we should still use data from B because it is already present in memory.
Use a real json copied from LingJun instance.
We should handle the case when server returned incorrect or multiple items.
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Currently, the InstanceType query logic is bound to the volume count logic, primarily because we need both Node.status.volumesAttached (for currently managed disks) and node.annotation (for DiskQuantity) from the same get node response, and the
statusmust be fetched in the middle of the volume count logic. This makes it hard to extract more info (e.g. nvmeSupport) from the fetched metadata apart from DiskQuantity.So I moved
DiskQuantitylogic tocloud/metadata, it fits well because we have 3 different places to get this info.To implement this, two major refactor is done in
cloud/metadata:anyfor the value, but we still keep it type-safe for all public API. This allows us to add 2 new non-string metadata: DiskQuantity (int32) and MachineKind (enum: ECS/LingJun).m.WithSession(ctx)API to inject a context and allow retry of previously failed fetchers. So that we can keep retry as theNodeGetInfoCSI GRPC call retries.Then, two new fetchers for ECS
DescribeInstanceTypesand EFLODescribeNodeTypeis added. K8s fetcher is extended to parse the annotation.The disk driver is changed to use the added metadata fields.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
/hold
based on #1599, merge it first
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: