Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,15 @@
[v0.0.1-beta.2](https://github.com/scality/disk-management-agent/releases/tag/v0.0.1-beta.2)
(PR[#4933](https://github.com/scality/metalk8s/pull/4933))

### Bug Fixes

- Fix a bug where the salt mine fails silently during upgrades due to a corrupted mine cache.
(PR[#4934](https://github.com/scality/metalk8s/pull/4934))
Comment thread
eg-ayoub marked this conversation as resolved.

- Fix a bug where the salt mine fails and prints many warnings when dex is disabled.
(PR[#4934](https://github.com/scality/metalk8s/pull/4934))


## Release 133.0.4

## Release 133.0.3
Expand Down
4 changes: 4 additions & 0 deletions pillar/metalk8s/roles/ca.sls
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,11 @@ mine_functions:
- mine_function: hashutil.base64_encodefile
- /etc/kubernetes/pki/sa.pub

{%- if pillar.addons.dex.enabled %}
Comment thread
eg-ayoub marked this conversation as resolved.
dex_ca_b64:
- mine_function: hashutil.base64_encodefile
- /etc/metalk8s/pki/dex/ca.crt
{%- endif %}

ingress_ca_b64:
- mine_function: hashutil.base64_encodefile
Expand Down Expand Up @@ -70,13 +72,15 @@ x509_signing_policies:
- keyUsage: critical digitalSignature, keyEncipherment
- extendedKeyUsage: serverAuth
- authorityKeyIdentifier: keyid
{%- if pillar.addons.dex.enabled %}
dex_server_policy:
- minions: '*'
- signing_private_key: /etc/metalk8s/pki/dex/ca.key
- signing_cert: /etc/metalk8s/pki/dex/ca.crt
- keyUsage: critical digitalSignature, keyEncipherment
- extendedKeyUsage: serverAuth
- authorityKeyIdentifier: keyid
{%- endif %}
backup_server_policy:
- minions: '*'
- signing_private_key: /etc/metalk8s/pki/backup-server/ca.key
Expand Down
15 changes: 15 additions & 0 deletions scripts/upgrade.sh.in
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,20 @@ upgrade_bootstrap () {
metalk8s.salt.master.installed saltenv="$SALTENV"
}

flush_and_refresh_mine() {
# After upgrading salt-master, the mine cache files on the master may have
# been corrupted by a non-atomic write interrupted mid-SIGTERM (kubelet
# kills the container when the manifest changes). mine.update alone cannot
# fix this: it reads the corrupt file first, fails, and silently discards
# the new data. mine.flush deletes the file (no read needed), after which
# mine.update writes a clean cache from scratch.
# Run on all minions so the master has a clean, up-to-date mine cache for
# the entire cluster before any upgrade state queries it.
SALT_MASTER_CALL=("${EXEC_CONTAINER_COMMAND[@]}" "$(get_salt_container)")
"${SALT_MASTER_CALL[@]}" salt '*' mine.flush
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_quiet disables set -e, so if mine.flush fails, execution continues to mine.update. Since the whole point is to delete the corrupt cache first, a failed flush means mine.update hits the same corrupt file — exactly the bug this function exists to fix. Check the return code of mine.flush before proceeding.

```suggestion
"${SALT_MASTER_CALL[@]}" salt '*' mine.flush || return 1

"${SALT_MASTER_CALL[@]}" salt '*' mine.update
}

launch_pre_upgrade () {
SALT_MASTER_CALL=("${EXEC_CONTAINER_COMMAND[@]}" "$(get_salt_container)")
"${SALT_MASTER_CALL[@]}" salt-run saltutil.sync_all \
Expand Down Expand Up @@ -209,6 +223,7 @@ run "Performing Pre-Upgrade checks" precheck_upgrade
"$BASE_DIR"/backup.sh --no-replication

run "Upgrading bootstrap" upgrade_bootstrap
run "Refreshing Salt mine on nodes" flush_and_refresh_mine
run "Setting cluster version to $DESTINATION_VERSION" patch_kubesystem_namespace
run "Launching the pre-upgrade" launch_pre_upgrade
run "Upgrading etcd cluster" upgrade_etcd
Expand Down
Loading