preinstall: generalize minimum-value semantics for managed sysctls#13224
preinstall: generalize minimum-value semantics for managed sysctls#13224yankay wants to merge 1 commit into
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: yankay The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
This PR generalizes “floor” (minimum-value) semantics for sysctl management in the preinstall role, making the default floor set configurable and allowing users to define their own floor-managed sysctls without overwriting higher, pre-tuned node values.
Changes:
- Move the default
sysctl_minimum_valuesmap intoroles/kubespray_defaults/defaults/main/main.ymland expand it with common tunables. - Add
additional_sysctl_min(mapping) and merge it over defaults to compute effective floor-managed sysctl values. - Update preinstall sysctl tasks to read current values and only apply when current < desired; document behavior in large-deployments docs.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| roles/kubespray_defaults/defaults/main/main.yml | Introduces configurable default sysctl floors and additional_sysctl_min user override map. |
| roles/kubernetes/preinstall/tasks/0080-system-configurations.yml | Computes effective floor map and applies sysctls only when below the configured minimum. |
| docs/operations/large-deployments.md | Documents floor semantics, defaults, and how to override/disable floors. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| sysctl_minimum_values: | ||
| fs.inotify.max_user_instances: 8192 | ||
| loop: "{{ sysctl_minimum_values | dict2items }}" | ||
| failed_when: false # tolerate keys not present on this kernel/distro |
There was a problem hiding this comment.
Good catch — fixed by adding check_mode: false to the read task (it is read-only, safe to run under --check) and defensive default() on item.rc / item.stdout so the apply task is a no-op if anything is ever skipped.
73a79b8 to
e414ef9
Compare
e414ef9 to
bbce9ef
Compare
Move the hardcoded sysctl floor map out of 0080-system-configurations.yml into a default variable, and introduce additional_sysctl_min so users can declare their own sysctl floors. Floor semantics only apply a managed value when the node's current value is lower; pre-tuned images (Bottlerocket, EKS AMIs, OpenShift RHCOS, hardened distros) and newer kernels that already set higher values are left alone. Default floors are aligned with prior art from the OpenShift Node Tuning Operator (TuneD '=>' floor syntax in the openshift / openshift-node profiles) and kops. None of the floor keys are part of the kubelet protect-kernel-defaults validation list, so the floor mechanism cannot interfere with kubelet startup. Keys not present on the running kernel/distro are skipped without failing. additional_sysctl (exact-value semantics) is unchanged for backward compatibility. Signed-off-by: Kay Yan <kay.yan@daocloud.io>
bbce9ef to
5ab89cf
Compare
|
/cc @tico88612 Hi, could you take a look when you have time? CI is green now (markdownlint fixed). Thanks! |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Generalize the minimum-value (floor) sysctl pattern that was introduced for a single key in #13075:
sysctl_minimum_valuesmap out ofroles/kubernetes/preinstall/tasks/0080-system-configurations.ymlinto a default variable inroles/kubespray_defaults/defaults/main/main.ymlso it can be customized.additional_sysctl_minso users can declare their own sysctl floors (parallel toadditional_sysctl, but with floor semantics).Floor semantics: only apply a managed value when the node's current value is lower than the desired one. If the node already uses a higher value, leave it alone. This avoids silently regressing pre-tuned images (Bottlerocket, EKS AMIs, OpenShift RHCOS, hardened distros) and newer kernels (e.g. Linux 5.11+ where
fs.inotify.max_user_watchesis computed dynamically from memory and may exceed common hardcoded values).The default
sysctl_minimum_valuesis aligned with prior art from the OpenShift Node Tuning Operator (TuneD's=>floor syntax in theopenshiftandopenshift-nodeprofiles, e.g.kernel.pid_max=>4194304,fs.aio-max-nr=>1048576) and kopsnodeup/pkg/model/sysctls.go.fs.inotify.max_user_instances8192fs.inotify.max_user_watches65536kernel.pid_max4194304=>floor)fs.aio-max-nr1048576=>floor)vm.max_map_count262144net.netfilter.nf_conntrack_max1048576net.core.somaxconn32768net.core.netdev_max_backlog16384fs.file-max2097152net.ipv4.neigh.default.gc_thresh{1,2,3}8192/32768/65536net.ipv6.neigh.default.gc_thresh{1,2,3}8192/32768/65536Which issue(s) this PR fixes:
Fixes #13209
Special notes for your reviewer:
protect-kernel-defaultsvalidation list (vm.overcommit_memory,vm.panic_on_oom,kernel.panic,kernel.panic_on_oops,kernel.keys.root_max{keys,bytes}), so the floor mechanism cannot interfere with kubelet startup.net.netfilter.nf_conntrack_maxwhennf_conntrackis not loaded) are skipped without failing thanks tofailed_when: false+when: item.rc == 0.additional_sysctlkeeps its exact-value semantics and is unchanged for backward compatibility.0inadditional_sysctl_min(the current value is virtually always>= 0, so the apply condition will never trigger).docs/operations/large-deployments.md(the most relevant existing page) instead of adding a new doc.Other prior art surveyed:
pkg/cli/cmds/profile_linux.go- read-then-decide pattern for CIS validation.KernelParamDefaultsController- default kernel parameters baked into the OS.fs.inotify.max_user_instances=8192by default.Does this PR introduce a user-facing change?: