Initialize scrape target labels the same way Prometheus does#5018
Initialize scrape target labels the same way Prometheus does#5018gyanranjanpanda wants to merge 2 commits into
Conversation
|
|
852d073 to
7da2fef
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 852d0735ef
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
7da2fef to
8a0bb89
Compare
|
@swiatekm could u review this |
8a0bb89 to
2c8e0f2
Compare
| // scheme, scrape_interval, scrape_timeout) so that target hashes are consistent | ||
| // with what Prometheus computes. | ||
| func (m *Discoverer) processTargetGroups(jobName string, groups []*targetgroup.Group, intoTargets []*Item, cfg *promconfig.ScrapeConfig) { | ||
| lb := labels.NewBuilder(labels.EmptyLabels()) |
There was a problem hiding this comment.
i believe we had this in the past, but it became a major performance issue, have you run any performance benchmarks to ensure we don't have a regression?
|
also, im not sure this does enough to confirm Mikolaj's concern here:
|
|
(sorry accidentally pressed the close pr button) |
|
Thank you for the review @jaronoff97! Benchmark ResultsI ran main (baseline)This PRNo regression. Throughput, latency, allocations, and memory are all within normal run-to-run variance. The alloc count is identical (1344 allocs/op) and heap usage is essentially unchanged (~72 KB/op). Testing & VerificationTo address the concern about regressions:
The performance concern from history is likely the cost of calling into Prometheus internals for every target. In this implementation, |
2c8e0f2 to
9072532
Compare
E2E Test Results 33 files 256 suites 2h 13m 13s ⏱️ Results for commit 2737e47. ♻️ This comment has been updated with latest results. |
77a208c to
2737e47
Compare
|
@swiatekm @jaronoff97 Here are the comprehensive benchmark results you requested, including all target processing benchmarks with large target counts (1K → 800K). BenchmarkProcessTargets (full pipeline: SD → labels → hashing → allocation)Platform:
BenchmarkProcessTargetsWithRelabelConfig (with keep/drop relabel rules)
Key Takeaways
CI NoteThe 2 failing e2e tests ( |
| // populateDiscoveredLabels replicates the label initialization logic from Prometheus's | ||
| // PopulateDiscoveredLabels in scrape/target.go. It sets base labels from target and group | ||
| // labels and scrape configuration, before relabeling. | ||
| // We replicate this instead of importing the scrape package due to dependency conflicts |
There was a problem hiding this comment.
What dependency conflicts? I checked out this branch, replaced this implementation with the import from prometheus, and it compiled fine.
There was a problem hiding this comment.
Good catch — you're right, there are no dependency conflicts. I've removed the local replication and now import scrape.PopulateDiscoveredLabels directly from github.com/prometheus/prometheus/scrape. The only additional change needed was adding go.opentelemetry.io/contrib/instrumentation/net/http/httptrace/otelhttptrace to go.sum (a transitive dependency of the scrape package).
All tests pass. Thanks for verifying this!
|
Those performance numbers are acceptable to me, if they're the price for more correctness. |
…lemetry#4074) Signed-off-by: Gyan Ranjan Panda <sanupanda141@gmail.com>
2737e47 to
ca439aa
Compare
Description
Fixes #4074.
Problem
The target allocator takes target labels unmodified from service discovery. Prometheus adds scrape config defaults (
job,__metrics_path__,__scheme__,__scrape_interval__,__scrape_timeout__) viaPopulateDiscoveredLabelsbefore hashing. This mismatch causes hash collisions for targets with the same address but different scrape configurations.PR #4066 was a temporary fix that manually included the job name in the hash. This PR is the proper fix — replicating Prometheus's
PopulateDiscoveredLabelslogic so target labels are initialized identically before hashing.Changes
discovery.go: AddedpopulateDiscoveredLabels()that replicates Prometheus's label initialization.processTargetGroupsnow uses this to set scrape config defaults (job, metrics_path, scheme, etc.) on target labels before creating Items.target.go: SimplifiedHash()to uselabels.Hash()directly (since labels now include job name). RemovedLabelsHashWithJobName()— no longer needed. UpdatedHashFromBuilder()signature (removedjobNameparam).Tests: Updated all hash tests, added
TestPopulateDiscoveredLabelswith 4 cases, updated server HTML testdata snapshots.Testing
cmd/otel-allocatorunit tests pass (go test ./... -count=1)References
PopulateDiscoveredLabels: https://github.com/prometheus/prometheus/blob/main/scrape/target.go