Skip to content

[processor/k8sattributes] feat: add reverse mapping between podUID and podIdentifier to delete stale entries#48398

Open
wehzzz wants to merge 3 commits into
open-telemetry:mainfrom
wehzzz:fix-k8sattributes-stale-container-ids
Open

[processor/k8sattributes] feat: add reverse mapping between podUID and podIdentifier to delete stale entries#48398
wehzzz wants to merge 3 commits into
open-telemetry:mainfrom
wehzzz:fix-k8sattributes-stale-container-ids

Conversation

@wehzzz
Copy link
Copy Markdown

@wehzzz wehzzz commented May 15, 2026

Description

We observed unbounded growth in podTableSize when running the k8sattributes processor in an OTel eBPF profiler-based collector configured to associate pods by container.id. This was observed on both v0.150.0 and v0.151.0.

The issue happens on clusters with container restarts. A pod can stay alive while one of its containers restarts and receives a new container.id. The cache then adds the new container.id association, but the previous container.id association is not queued for deletion because there is no pod delete event. Since deleteLoop only processes entries that have already been added to the delete queue, the stale association remains in the cache.

This PR fixes the missing cleanup path by keeping a reverse index from podUID to known PodIdentifiers. With that index:

  • On pod update, identifiers that disappeared from the current pod state are scheduled for deletion after the existing grace period.
  • If an identifier becomes current again before the grace period expires, the pending deletion is cancelled.
  • On pod delete, all identifiers known for that pod UID are cleaned up, including historical container.id associations that are no longer present in the final pod object.

Before fix

image
Signal 01/05 04/05
Pods on node 50 49
Containers on node 161 160
otelcol_otelsvc_k8s_pod_added 147 441
otelcol_otelsvc_k8s_pod_deleted N/A 294
otelcol_otelsvc_k8s_pod_updated 13,053 182,379
otelcol_otelsvc_k8s_pod_table_size 1,050 7,051

After fix

Signal ~30 min uptime ~79 min uptime ~6h25 uptime
Pods on node 40 40 40
Containers on node 135 135 135
Sum of restartCount across currently running containers 376 416 653
otelcol_otelsvc_k8s_pod_added 120 144 165
otelcol_otelsvc_k8s_pod_deleted N/A 24 45
otelcol_otelsvc_k8s_pod_updated 801 2,706 11,496
otelcol_otelsvc_k8s_pod_table_size 432 432 432

Testing

  • Added unit tests to cover various deletion scenarios.
  • Tested the patched version against real-world workloads.

@github-actions github-actions Bot added the first-time contributor PRs made by new contributors label May 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Welcome, contributor! Thank you for your contribution to opentelemetry-collector-contrib.

Important reminders:

  • Read our Contributing Guidelines.
  • Sign the CLA if you haven't already.
  • First-time contributors should have at most one PR not marked as draft until their first PR is merged.
  • If your change isn't one of our priority components, reviews may take more time.
  • Give reviewers at least a few days before pinging them for feedback.
  • If you need help or struggle to move your PR forward, raise the topic on #otel-collector-dev or a Collector SIG meeting.

@wehzzz wehzzz changed the title feat: add reverse mapping between podUID and podIdentifier to delete stale entries [processor/k8sattributes] feat: add reverse mapping between podUID and podIdentifier to delete stale entries May 15, 2026
@wehzzz wehzzz marked this pull request as ready for review May 15, 2026 09:50
@github-actions github-actions Bot added the processor/k8sattributes k8s Attributes processor label May 15, 2026
@github-actions github-actions Bot requested a review from odubajDT May 15, 2026 09:50
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8887d3b3bb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

podTableSize := len(c.Pods)
c.m.Unlock()

c.appendDeleteRequests(staleIdentifiers, true)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Cancel stale pod identifier deletes when restored

When an update temporarily drops an identifier, this queues a delayed delete, but a later update that restores the same identifier for the same pod UID never removes that pending request. deleteLoopProcessing only checks that the cached pod still has the same UID before deleting, so after the grace period it will remove an active association (for example a container.id or pod IP that disappeared in one informer update and reappeared before cleanup), breaking lookups until another update repopulates it.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be addressed by the latest commits. I’m happy to adjust the implementation if you see a better approach.

I chose to track pending stale deletes in a map and store pointers to the queued delete requests, so a restored identifier can cancel its pending deletion with an average-case O(1) map lookup, instead of scanning the delete queue on every pod update.

The trade-off is a small amount of additional memory for stale identifiers while their delete requests are pending. That is bounded by the number of stale identifiers observed during the delete grace period. Once deleteLoopProcessing reaches those entries, cancelled requests are skipped and removed from the pending map.

@ChrsMark
Copy link
Copy Markdown
Member

@wehzzz for the PR. However I'm not sure what this PR is trying to fix and why

go c.deleteLoop(time.Second*30, defaultPodDeleteGracePeriod)
is not enough. Could you provide more details here?

Also could you provide what version you are hitting that issue on? BTW #47357 recently tried to fix sth similar, so could you ensure you run a version that includes this fix?

@wehzzz
Copy link
Copy Markdown
Author

wehzzz commented May 18, 2026

Thanks for the question @ChrsMark . The issue this PR is trying to fix is slightly different from #47357.

deleteLoop only processes entries that have already been added to the delete queue. In the case I am addressing, from my understanding, a pod can stay alive while one of its containers restarts and gets a new container.id. The k8sattributes cache then adds the new container.id association, but the previous container.id association is not queued for deletion because there is no pod delete event. Later, the pod delete object only contains the current container status, so historical container IDs can also be missed.

So the problem is not that the 30s loop/grace period is too slow, it's the unbounded growth explained in the PR description. The missing part is detecting identifiers that disappeared from the pod state and explicitly enqueueing them for deletion.

This PR adds a reverse index from podUID to known PodIdentifiers so that:

  • on pod update, identifiers that disappeared from the current pod state are scheduled for deletion after the existing grace period.
  • if the identifier becomes current again before the grace period expires, the pending deletion is cancelled.
  • on pod delete, all identifiers known for that pod UID are cleaned up, including historical container IDs that are no longer present in the final pod object.

Regarding #47357: I believe that fix addresses map bucket retention after entries have been deleted from the cache. This PR addresses a case where stale container.id entries were never queued for deletion in the first place, so #47357 does not fully cover it.

We observed this behavior on both 0.150.0 and 0.151.0.

Copy link
Copy Markdown
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank's for additional details. Could you update the PR's description accordingly to reflect the intention of this change clearly?

I have left a question to clarify. Other code-owners of the component should also verify this.

Also I suggest for this change to add a changelog entry as a perf enhancement even if it's not strictly speaking "user facing".


identifiers := c.getIdentifiersFromAssoc(newPod)
if newPod.PodUID != "" {
staleIdentifiers = c.getStalePodIdentifiersForDeletionLocked(newPod.PodUID, identifiers)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just remove the stale identifiers directly instead of maintaining a second delete queue for this? From my perspective for any given Pod at any given time only the active identifiers should be maintained. Not sure if I miss anything here. @dmitryax please chime in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

first-time contributor PRs made by new contributors processor/k8sattributes k8s Attributes processor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants