channels: move from protokube to a static pod#18328
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
b1dd5ca to
427e3b3
Compare
ed8903e to
56c10c5
Compare
72dcd77 to
1bd57d7
Compare
|
This is ready for review. It is based on what we discussed during office hours, so no concerns regarding the direction. |
|
|
||
| if os.Getenv("S3_ENDPOINT") != "" { | ||
| for _, name := range []string{"S3_ENDPOINT", "S3_REGION", "S3_ACCESS_KEY_ID", "S3_SECRET_ACCESS_KEY"} { | ||
| add(name, os.Getenv(name)) |
There was a problem hiding this comment.
This would expose secrets to anything with "get pods" access in kube-system. could we replace this with a volume mount and a .env file?
There was a problem hiding this comment.
Good call, but already the pattern in a few other places. Updating to follow the same approach as there for now in 0b33d94. Once this merges we can move all these to separate files.
When set, re-applies the channel on the given duration until SIGTERM, logging per-iteration errors instead of exiting. Default 0 preserves the existing one-shot behavior for current callers (protokube, CI). Enables running channels as a long-lived workload (e.g. a static pod) without an external loop.
Relocates the control-plane node labeler from protokube to a new channels/pkg/nodelabeler package and renames it to BootstrapControlPlaneNodeLabels. Protokube still drives the call for now via the new import path. This is preparation for running channels as a static pod that owns both addon application and the labels addons target. The labeler's tainter.go scratch types are removed; the new package inlines only the patch struct it needs.
apply channel now takes one or more channel URLs and applies them sequentially per invocation. With --interval the loop iterates over the URLs each tick, mirroring protokube's old syncOnce ordering. Per-channel errors are collected via multierr so one bad channel does not stop the rest. Single-URL callers continue to work unchanged. Adds --node-name: when set, each iteration patches the named node with the mandatory control-plane labels via channels/pkg/nodelabeler. Empty --node-name skips labeling, which is the right default for one-shot CLI use from a developer's laptop. The kops-channels static pod supplies --node-name via the downward API. Together these let a single channels process own both addon application and control-plane labeling for the entire channel set, replacing protokube's per-channel subprocess fan-out and its separate labeler step.
Adds the ko-kops-channels-export Makefile target set (build, export, version-dist, dev-upload, push) cloning the kops-controller pattern, and wires kops-channels-push into cloudbuild.yaml so the staging push step pushes the new image alongside the others. Needed so channels can run as a static pod under kubelet instead of as a host binary invoked by protokube.
Adds a ChannelsBuilder that emits /etc/kubernetes/manifests/kops-channels.manifest. The pod runs one container per channel URL on a 60s interval; the bootstrap-channel container additionally patches the local node with control-plane labels via --bootstrap-node-labels and the downward API. The pod is system-node-critical because it owns the labels addons target for scheduling, and uses hostNetwork so VFS can reach the cloud metadata service before CNI is up. At this commit the static pod and protokube both apply channels in parallel; that is safe because apply is idempotent via manifest-hash annotations. The protokube side is removed in the next commit.
Now that the kops-channels static pod owns both responsibilities, drop the protokube-side reconciliation: the channels exec wrapper, the --channels and --node-name flags, the labeler call, and the host-side install of /opt/kops/bin/channels in the nodeup builder. The KubeBoot struct sheds Channels and NodeName; the sync loop is now an idle keep-alive for the gossip goroutines and will be removed alongside the legacy gossip code path.
The first apply fails while a control-plane node's apiserver is still starting; retry every 5s until it succeeds rather than waiting a full interval, which delays cluster bootstrap. Also reuse a cached kube client per iteration.
|
@rifelpet anything else left to clarify here? |
Extracts addon channel reconciliation from protokube into a new
kops-channelsstatic pod, rendered by nodeup on control-plane nodes. The pod runs the existing channels binary in a long-lived loop, reconciles every cluster channel on a 60s interval, and patches the local node with the mandatory control-plane labels.To make this easier to review, keeping the cleanup for a separate PR.
TODO (followups):