Skip to content

preinstall: generalize minimum-value semantics for managed sysctls#13224

Open
yankay wants to merge 1 commit into
kubernetes-sigs:masterfrom
yankay:feat/sysctl-minimum-values
Open

preinstall: generalize minimum-value semantics for managed sysctls#13224
yankay wants to merge 1 commit into
kubernetes-sigs:masterfrom
yankay:feat/sysctl-minimum-values

Conversation

@yankay
Copy link
Copy Markdown
Member

@yankay yankay commented May 1, 2026

What type of PR is this?

/kind feature

What this PR does / why we need it:

Generalize the minimum-value (floor) sysctl pattern that was introduced for a single key in #13075:

  • Move the hardcoded sysctl_minimum_values map out of roles/kubernetes/preinstall/tasks/0080-system-configurations.yml into a default variable in roles/kubespray_defaults/defaults/main/main.yml so it can be customized.
  • Introduce additional_sysctl_min so users can declare their own sysctl floors (parallel to additional_sysctl, but with floor semantics).
  • Expand the default floor set with widely-used tunables (pid_max, nf_conntrack_max, neigh gc thresholds, inotify limits, somaxconn, ...).

Floor semantics: only apply a managed value when the node's current value is lower than the desired one. If the node already uses a higher value, leave it alone. This avoids silently regressing pre-tuned images (Bottlerocket, EKS AMIs, OpenShift RHCOS, hardened distros) and newer kernels (e.g. Linux 5.11+ where fs.inotify.max_user_watches is computed dynamically from memory and may exceed common hardcoded values).

The default sysctl_minimum_values is aligned with prior art from the OpenShift Node Tuning Operator (TuneD's => floor syntax in the openshift and openshift-node profiles, e.g. kernel.pid_max=>4194304, fs.aio-max-nr=>1048576) and kops nodeup/pkg/model/sysctls.go.

Sysctl key Default floor Source
fs.inotify.max_user_instances 8192 OpenShift, kops, #13075
fs.inotify.max_user_watches 65536 OpenShift
kernel.pid_max 4194304 OpenShift (=> floor)
fs.aio-max-nr 1048576 OpenShift (=> floor)
vm.max_map_count 262144 OpenShift
net.netfilter.nf_conntrack_max 1048576 OpenShift
net.core.somaxconn 32768 kops
net.core.netdev_max_backlog 16384 kops
fs.file-max 2097152 kops
net.ipv4.neigh.default.gc_thresh{1,2,3} 8192 / 32768 / 65536 OpenShift
net.ipv6.neigh.default.gc_thresh{1,2,3} 8192 / 32768 / 65536 OpenShift

Which issue(s) this PR fixes:

Fixes #13209

Special notes for your reviewer:

  • None of the floor keys are part of the kubelet protect-kernel-defaults validation list (vm.overcommit_memory, vm.panic_on_oom, kernel.panic, kernel.panic_on_oops, kernel.keys.root_max{keys,bytes}), so the floor mechanism cannot interfere with kubelet startup.
  • Keys not present on the running kernel/distro (e.g. net.netfilter.nf_conntrack_max when nf_conntrack is not loaded) are skipped without failing thanks to failed_when: false + when: item.rc == 0.
  • additional_sysctl keeps its exact-value semantics and is unchanged for backward compatibility.
  • Users can disable a default floor by setting it to 0 in additional_sysctl_min (the current value is virtually always >= 0, so the apply condition will never trigger).
  • Documentation appended to docs/operations/large-deployments.md (the most relevant existing page) instead of adding a new doc.

Other prior art surveyed:

Does this PR introduce a user-facing change?:

Add `additional_sysctl_min` for declaring sysctl values with floor semantics: the value is only applied when the node's current value is lower than the desired one. The default `sysctl_minimum_values` is expanded to cover common container-density and networking tunables (kernel.pid_max, nf_conntrack_max, neigh gc thresholds, inotify limits, etc.), aligned with prior art from OpenShift Node Tuning Operator and kops. Existing `additional_sysctl` (exact-value semantics) is unchanged.

Copilot AI review requested due to automatic review settings May 1, 2026 08:09
@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 1, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: yankay

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 1, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR generalizes “floor” (minimum-value) semantics for sysctl management in the preinstall role, making the default floor set configurable and allowing users to define their own floor-managed sysctls without overwriting higher, pre-tuned node values.

Changes:

  • Move the default sysctl_minimum_values map into roles/kubespray_defaults/defaults/main/main.yml and expand it with common tunables.
  • Add additional_sysctl_min (mapping) and merge it over defaults to compute effective floor-managed sysctl values.
  • Update preinstall sysctl tasks to read current values and only apply when current < desired; document behavior in large-deployments docs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
roles/kubespray_defaults/defaults/main/main.yml Introduces configurable default sysctl floors and additional_sysctl_min user override map.
roles/kubernetes/preinstall/tasks/0080-system-configurations.yml Computes effective floor map and applies sysctls only when below the configured minimum.
docs/operations/large-deployments.md Documents floor semantics, defaults, and how to override/disable floors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sysctl_minimum_values:
fs.inotify.max_user_instances: 8192
loop: "{{ sysctl_minimum_values | dict2items }}"
failed_when: false # tolerate keys not present on this kernel/distro
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed by adding check_mode: false to the read task (it is read-only, safe to run under --check) and defensive default() on item.rc / item.stdout so the apply task is a no-op if anything is ever skipped.

@yankay yankay force-pushed the feat/sysctl-minimum-values branch from 73a79b8 to e414ef9 Compare May 1, 2026 08:14
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 1, 2026
@yankay yankay force-pushed the feat/sysctl-minimum-values branch from e414ef9 to bbce9ef Compare May 1, 2026 08:18
Move the hardcoded sysctl floor map out of 0080-system-configurations.yml
into a default variable, and introduce additional_sysctl_min so users can
declare their own sysctl floors. Floor semantics only apply a managed
value when the node's current value is lower; pre-tuned images (Bottlerocket,
EKS AMIs, OpenShift RHCOS, hardened distros) and newer kernels that
already set higher values are left alone.

Default floors are aligned with prior art from the OpenShift Node Tuning
Operator (TuneD '=>' floor syntax in the openshift / openshift-node
profiles) and kops. None of the floor keys are part of the kubelet
protect-kernel-defaults validation list, so the floor mechanism cannot
interfere with kubelet startup. Keys not present on the running
kernel/distro are skipped without failing.

additional_sysctl (exact-value semantics) is unchanged for backward
compatibility.

Signed-off-by: Kay Yan <kay.yan@daocloud.io>
@yankay yankay force-pushed the feat/sysctl-minimum-values branch from bbce9ef to 5ab89cf Compare May 7, 2026 02:48
@yankay
Copy link
Copy Markdown
Member Author

yankay commented May 7, 2026

/cc @tico88612

Hi, could you take a look when you have time? CI is green now (markdownlint fixed). Thanks!

@k8s-ci-robot k8s-ci-robot requested a review from tico88612 May 7, 2026 06:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow minimum-value semantics for managed sysctls

3 participants