fix(testutil): replace fixed sleep in WaitForSpanFlush with span polling#463
fix(testutil): replace fixed sleep in WaitForSpanFlush with span polling#463mohanakatari119-bit wants to merge 6 commits into
Conversation
Signed-off-by: mohanakatari119-bit <mohana.katari119@gmail.com>
There was a problem hiding this comment.
Pull request overview
This PR improves test reliability by replacing a fixed sleep-based “span flush” helper with a polling-based helper that waits until the in-process collector has received a minimum number of spans (or times out), and updates existing tests to use it.
Changes:
- Replace
WaitForSpanFlush(fixed 200ms sleep) withWaitForSpans(t, collector, minSpans)that polls for spans with a timeout. - Add
(*TestFixture).WaitForSpans(minSpans int)convenience wrapper. - Migrate integration/e2e tests to wait for an explicit minimum span count and add unit tests for the new helper.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| test/testutil/readiness.go | Replaces fixed sleep span flush helper with polling-based WaitForSpans. |
| test/testutil/readiness_test.go | Adds unit tests covering immediate-return and delayed-arrival polling behavior. |
| test/testutil/fixture.go | Adds a fixture method wrapper for WaitForSpans. |
| test/integration/http_server_test.go | Uses fixture span polling instead of fixed sleep. |
| test/integration/grpc_shutdown_test.go | Uses fixture span polling instead of fixed sleep. |
| test/integration/grpc_server_test.go | Uses fixture span polling instead of fixed sleep. |
| test/e2e/grpc_test.go | Uses fixture span polling with a higher minimum span count. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
NameHaibinZhang
left a comment
There was a problem hiding this comment.
Thanks for the contribution! The polling approach is a clear improvement over the fixed sleep — it's both faster and more reliable. A few comments below:
Critical: Race condition in polling loop
WaitForSpans calls AllSpans(c.GetTraces()). GetTraces() releases the mutex before returning, but AllSpans then iterates the underlying trace data without holding the lock. Meanwhile, the collector's HTTP handler goroutine may concurrently append new spans via MoveAndAppendTo.
While this is a pre-existing pattern in the codebase (e.g., RequireTraceCount, RequireSingleSpan), the old code read traces only once after a 200ms sleep — by which time spans had typically all arrived. The new tight polling loop (every 25ms) reads repeatedly while spans are still arriving, significantly widening the race window.
Since this PR's goal is to improve test reliability, introducing a data race risk works against that goal. A simple fix would be to add a SpanCount() method on Collector that counts spans under the lock:
func (c *Collector) SpanCount() int {
c.mu.Lock()
defer c.mu.Unlock()
return countSpans(c.traces)
}Then WaitForSpans only checks the count rather than iterating unlocked trace data.
Suggestion: Missing timeout path test
The tests cover "spans already present" and "spans arrive after delay", but there's no test for the timeout/failure path. Consider adding one (possibly as a follow-up if it requires making the timeout configurable).
Suggestion: Test couples to Collector internals
TestWaitForSpans_pollsUntilSpansArrive directly manipulates c.mu and c.traces. This creates coupling to internal structure. Consider using the HTTP interface to simulate span arrival for a more realistic test. (Acceptable for a unit test though — just a suggestion for robustness.)
Nit: Magic number in e2e test
f.WaitForSpans(4) — consider adding a comment explaining where 4 comes from, e.g. // 2 traces (unary + stream) × 2 spans (client + server).
Add SpanCount() method on Collector that counts spans under the mutex, replacing the unlocked AllSpans(GetTraces()) calls in the polling loop that widened the race window during repeated 25ms polls. Also clarify the magic span count in the e2e test.
Extract pollForSpans helper with a configurable timeout so the failure path can be tested without a 3-second wait. Add TestWaitForSpans_timesOut to cover the timeout branch. Rewrite the existing polling test to use the collector's HTTP interface via sendSpan, removing direct access to c.mu and c.traces.
1cd0cbd to
f5a529a
Compare
txabman42
left a comment
There was a problem hiding this comment.
LGTM! TY for this improvement ❤️
Rename for naming consistency with defaultSpanPollInterval, as suggested by txabman42 in code review.
The
WaitForSpanFlushhelper used a hardcoded 200ms sleep to let spans arrive at the in-process collector before test assertions ran. On a loaded CI runner 200ms is not enough, which causes spurious failures where the collector holds zero spans at assertion time. On fast machines it is just dead time.WaitForSpanFlushis replaced byWaitForSpans(t, c, minSpans)which pollscollector.GetTraces()every 25ms and returns as soon as the expected number of spans are present, with a 3s deadline. A convenience method(f *TestFixture) WaitForSpans(minSpans int)is added so tests that already hold a fixture do not need to reach into the collector directly.All four callers are migrated:
grpc_server_test.go,grpc_shutdown_test.go,http_server_test.goeach wait for 1 span;e2e/grpc_test.gowaits for 4 spans (2 traces x 2 spans per trace, matching its existingRequireTraceCount(2)andRequireSpansPerTrace(2)assertions).Two unit tests are added in
readiness_test.go: one confirms the function returns immediately when spans are already present, and one confirms it successfully polls until a span arrives after a short delay.Tests passing
