Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions docs/operations/large-deployments.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,21 @@ For a large scaled deployments, consider the following configuration changes:

For example, when deploying 200 nodes, you may want to run ansible with
``--forks=50``, ``--timeout=600`` and define the ``retry_stagger: 60``.

Sysctl tuning with floor semantics
----------------------------------

`sysctl_minimum_values` defines floors that are only applied when the
node's current value is lower, so pre-tuned images and newer kernels are
left alone. Use `additional_sysctl_min` to add or override floors (set a
key to `0` to disable a default); use `additional_sysctl` for
exact-value semantics.

```yaml
additional_sysctl_min:
net.core.somaxconn: 65535
kernel.pid_max: 0
```

See `roles/kubespray_defaults/defaults/main/main.yml` for the default
floor set.
20 changes: 12 additions & 8 deletions roles/kubernetes/preinstall/tasks/0080-system-configurations.yml
Original file line number Diff line number Diff line change
Expand Up @@ -125,25 +125,29 @@
- { name: vm.panic_on_oom, value: 0 }
when: kubelet_protect_kernel_defaults | bool

- name: Read current sysctl values
- name: Compute effective sysctl minimum values (defaults + user overrides)
set_fact:
_sysctl_minimum_values_effective: "{{ sysctl_minimum_values | combine(additional_sysctl_min) }}"

- name: Read current sysctl values for floor-managed keys
command: sysctl -n {{ item.key }}
register: sysctl_settings
changed_when: false
vars:
# For integer sysctls only
sysctl_minimum_values:
fs.inotify.max_user_instances: 8192
loop: "{{ sysctl_minimum_values | dict2items }}"
failed_when: false # tolerate keys not present on this kernel/distro
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed by adding check_mode: false to the read task (it is read-only, safe to run under --check) and defensive default() on item.rc / item.stdout so the apply task is a no-op if anything is ever skipped.

check_mode: false # read-only; must run under --check so later loop has rc/stdout
loop: "{{ _sysctl_minimum_values_effective | dict2items }}"

- name: Increase sysctl value if lower than minimum
- name: Increase sysctl value if lower than configured minimum
ansible.posix.sysctl:
sysctl_file: "{{ sysctl_file_path }}"
name: "{{ item.item.key }}"
value: "{{ item.item.value }}"
state: present
reload: true
ignoreerrors: "{{ sysctl_ignore_unknown_keys }}"
when: item.stdout | int < item.item.value
when:
- item.rc | default(1) == 0
- (item.stdout | default('0')) | int < item.item.value | int
loop: "{{ sysctl_settings.results }}"

- name: Check dummy module
Expand Down
34 changes: 33 additions & 1 deletion roles/kubespray_defaults/defaults/main/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -612,10 +612,42 @@ kubelet_protect_kernel_defaults: true

# Set additional sysctl variables to modify Linux kernel variables, for example:
# additional_sysctl:
# - { name: kernel.pid_max, value: 131072 }
# - { name: vm.swappiness, value: 10 }
#
# `additional_sysctl` uses exact-value semantics. For floor semantics
# (only apply when the node's current value is lower), use
# `additional_sysctl_min` below.
additional_sysctl: []

# Sysctl floors: only applied when the node's current value is lower than
# the desired one, so pre-tuned images and newer kernels are not regressed.
# Defaults are aligned with OpenShift Node Tuning Operator and kops, and
# are outside the kubelet protect-kernel-defaults validation list.
sysctl_minimum_values:
fs.inotify.max_user_instances: 8192
fs.inotify.max_user_watches: 65536
kernel.pid_max: 4194304
fs.aio-max-nr: 1048576
vm.max_map_count: 262144
net.netfilter.nf_conntrack_max: 1048576
net.core.somaxconn: 32768
net.core.netdev_max_backlog: 16384
fs.file-max: 2097152
net.ipv4.neigh.default.gc_thresh1: 8192
net.ipv4.neigh.default.gc_thresh2: 32768
net.ipv4.neigh.default.gc_thresh3: 65536
net.ipv6.neigh.default.gc_thresh1: 8192
net.ipv6.neigh.default.gc_thresh2: 32768
net.ipv6.neigh.default.gc_thresh3: 65536

# User-defined sysctl floors, merged on top of `sysctl_minimum_values`
# (user values take precedence). Set a key to 0 to disable a default floor.
# Example:
# additional_sysctl_min:
# net.core.somaxconn: 65535
# kernel.pid_max: 0
additional_sysctl_min: {}

## List of key=value pairs that describe feature gates for
## the k8s cluster.
kube_feature_gates: []
Expand Down