feat(controller): add deletions_skipped_by_policy metric#6386
Conversation
Upsert-only and create-only act as safety nets but silently swallow
deletions and (under create-only) updates: a reconcile that held back
changes is indistinguishable from a true no-op today.
* Add `external_dns_controller_deletions_skipped_by_policy{owned}`,
reset-on-idle, with owned={true,false,unknown}.
* Partition by ownership before deduplication — RemoveDuplicates keys
on (DNS name, type, set identifier) only and would otherwise let a
foreign twin evict the owned record from the count.
* Emit per-record debug logs for skipped deletions and (under
create-only) skipped updates; skipped updates have no metric by
design.
* Distinguish a true no-op log from one where policy held changes
back, so drift under create-only (no metric fires there when only
updates are suppressed) is not silently masked.
|
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @jumasa! |
|
Hi @jumasa. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
This pr is AI slop-heavy and doing what is already supported, most likely should not be part of the codebase. The PR doesn't acknowledge verified_records exists or explain why a derived PromQL expression is insufficient, aka there is a gap in justification. The existing metrics already provide a way to detect the same drift:
Aka
|
Upsert-only and create-only act as safety nets but silently swallow deletions and (under create-only) updates: a reconcile that held back changes is indistinguishable from a true no-op today.
external_dns_controller_deletions_skipped_by_policy{owned}, reset-on-idle, with owned={true,false,unknown}.What does it do ?
Adds observability for DNS changes that
--policy=upsert-only/--policy=create-onlyhold back:external_dns_controller_deletions_skipped_by_policy{owned}, reset-on-idle, withowned ∈ {true, false, unknown}.record,type,targets,owned,policy.No DNS changes applied; N deletion[s] held back by policyreplacesAll records are already up to datein that case.no_op_runs_total, debug-log volume on large clusters, and an rfc2136 AXFR-disabled caveat.Motivation
Operators running either policy today have no observable signal when it suppresses a deletion or update: reconcile logs look healthy,
no_op_runs_totalticks, and the drift (empty source, broken AXFR, misconfigured CRD) only surfaces once thestale record causes trouble. This PR exposes that signal so it can be dashboarded and alerted on.
More