Skip to content

perf: switch 100kLRPS benchmarks to realistic entropy data source#2985

Open
cijothomas wants to merge 1 commit into
open-telemetry:mainfrom
cijothomas:perf/realistic-loadgen-entropy
Open

perf: switch 100kLRPS benchmarks to realistic entropy data source#2985
cijothomas wants to merge 1 commit into
open-telemetry:mainfrom
cijothomas:perf/realistic-loadgen-entropy

Conversation

@cijothomas
Copy link
Copy Markdown
Member

@cijothomas cijothomas commented May 15, 2026

Summary

Switches the rate-limited (100kLRPS) continuous benchmark tests from semantic_conventions + pre_generated to static + fresh + use_trace_context: true.

Problem

The semantic_conventions data source with pre_generated strategy replays identical payloads every batch, resulting in unrealistically high compression ratios. OTAP egress was as low as 6-9 bytes/log, and OTLP ~27 bytes/log — making network utilization benchmarks misleading.

Solution

The static data source with fresh generation provides:

  • 50 diverse log body templates (~150 bytes each) with realistic content
  • Unique trace_id/span_id per record via use_trace_context: true
  • Fresh timestamps and varied attribute values per batch

The uncompressed protobuf size per log record is approximately 250 bytes. See #2987 for a planned metric to expose this directly.

Before/After comparison (CI dedicated machine)

Test CPU% egress bytes/log TX MB/s Drop%
OTLP-ATTR-OTLP Before 65.3 27.4 2.61 0%
After 65.1 49.9 4.52 0%
OTLP-ATTR-OTAP Before 64.5 8.9 0.81 5%
After 64.4 37.3 3.38 0%
OTAP-ATTR-OTAP Before 49.4 9.2 0.79 5.3%
After 51.3 37.4 3.39 0%
OTAP-ATTR-OTLP Before 64.9 28.9 2.48 5.3%
After 64.9 57.9 5.24 0%
OTLP-BATCH-OTLP Before 64.6 27.7 2.51 5%
After 64.9 49.9 4.52 0%
OTAP-BATCH-OTAP Before 37.9 6.4 0.58 0.05%
After 38.2 34.9 3.16 0.05%

Key observations:

  • CPU is unchanged — the engine handles the higher-entropy data at the same cost
  • Compression ratios are now realistic: ~5:1 for OTLP (250→50), ~7:1 for OTAP (250→37)
  • OTAP columnar compression advantage over OTLP becomes measurable: 35-37 vs 50-58 bytes/log = ~30% less wire usage, which was invisible with the old low-entropy data
  • Saturation tests are not changed (they need pre_generated to avoid loadgen bottleneck)

Relates to #2540

@cijothomas cijothomas requested a review from a team as a code owner May 15, 2026 22:26
@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.91%. Comparing base (672d665) to head (5c545b1).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2985      +/-   ##
==========================================
- Coverage   85.92%   85.91%   -0.01%     
==========================================
  Files         725      725              
  Lines      275605   275605              
==========================================
- Hits       236811   236797      -14     
- Misses      38270    38284      +14     
  Partials      524      524              
Components Coverage Δ
otap-dataflow 87.04% <ø> (-0.01%) ⬇️
query_abstraction 80.61% <ø> (ø)
query_engine 89.57% <ø> (ø)
otel-arrow-go 52.45% <ø> (ø)
quiver 92.25% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cijothomas cijothomas added the pipelineperf Trigger for the pipeline performance job on PRs label May 15, 2026
@cijothomas cijothomas closed this May 15, 2026
@cijothomas cijothomas reopened this May 15, 2026
@cijothomas cijothomas force-pushed the perf/realistic-loadgen-entropy branch from b170962 to 5c545b1 Compare May 15, 2026 23:07
Copy link
Copy Markdown
Contributor

@JakeDern JakeDern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious - How does the static data compare to semantic registry data? For the comparison dashboard we're using fresh + semantic convention as the data generation strategy, but if this has better characteristics then maybe we'll consider swapping out there too.

@cijothomas
Copy link
Copy Markdown
Member Author

Just curious - How does the static data compare to semantic registry data? For the comparison dashboard we're using fresh + semantic convention as the data generation strategy, but if this has better characteristics then maybe we'll consider swapping out there too.

Yes. We'd need to switch there too. The semantic convention registry does not have much variations. I'll dive deeper into it next week. (We also need to find an easy way to tell if we are getting too-easy-compress/unrealiistic input to begin with.)

@JakeDern
Copy link
Copy Markdown
Contributor

Just curious - How does the static data compare to semantic registry data? For the comparison dashboard we're using fresh + semantic convention as the data generation strategy, but if this has better characteristics then maybe we'll consider swapping out there too.

Yes. We'd need to switch there too. The semantic convention registry does not have much variations. I'll dive deeper into it next week. (We also need to find an easy way to tell if we are getting too-easy-compress/unrealiistic input to begin with.)

Sounds good, I can make that change and compare rates! We have baseline measurements for those suites already with fresh generation + the semantic registry source, so we'll be able to see how the static source compares.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pipelineperf Trigger for the pipeline performance job on PRs

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants