Skip to content

fix(tests): prevent race condition in postgres driver tests#3673

Draft
fcecin wants to merge 2 commits into
masterfrom
fix/msg-test-wait-for-partition
Draft

fix(tests): prevent race condition in postgres driver tests#3673
fcecin wants to merge 2 commits into
masterfrom
fix/msg-test-wait-for-partition

Conversation

@fcecin
Copy link
Copy Markdown
Contributor

@fcecin fcecin commented Dec 16, 2025

Description

Messaging tests (such as tests/waku_archive/test_driver_postgres_query.nim) can temporarily fail if a new partition in the DB is not created in time. This is unlikely to happen when running the test suite, but it can happen (the insert can jump ahead of the partition creation by a few milliseconds).

Changes

  • Add PostgresDriver.waitForPartition() which waits for a partition to be available for a given timeout period
  • Call waitForPartition in the archive test suite and wait for a partition to be created before starting each messaging test

Issue

Maintenance Y2026H1 #3686

* add PostgresDriver.waitForPartition
* wait for a partition for 5s in archive driver tests setup
@github-actions
Copy link
Copy Markdown

This PR may contain changes to database schema of one of the drivers.

If you are introducing any changes to the schema, make sure the upgrade from the latest release to this change passes without any errors/issues.

Please make sure the label release-notes is added to make sure upgrade instructions properly highlight this change.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 16, 2025

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:3673

Built from 86e57cd

Copy link
Copy Markdown
Collaborator

@Ivansete-status Ivansete-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks so much for it! 🙌
Just added some nitpicks.
Cheers.

Comment thread tests/waku_archive/test_driver_postgres_query.nim

proc waitForPartition*(
self: PostgresDriver, timeout = chronos.seconds(5)
): Future[ArchiveDriverResult[void]] {.async.} =
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we mention that this is meant to avoid flaky testing only?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a generic timed wrapper for containsAnyPartition so I think we would have to put the same warning there as well. I'm not sure we should be afraid of people using timed-out queries to "a partition exists" outside of tests or as part of helping an actual test to test something useful.

var elapsed = chronos.milliseconds(0)

while elapsed < timeout:
if self.containsAnyPartition():
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should confirm that there is a valid partition for "now".
Each partition contains data for o'clock hours unix time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we step out of "there is a partition" as a basic check we may step into second-guessing what the partition manager is doing w.r.t. time calc.

Maybe that's the actual solution to this. I should probably get more into what the partition manager actually does. Now I'm not sure I want to merge this fix as it is :-)

I think this PR will linger here a bit since this testing race condition is so difficult to reproduce. It's the opposite of urgent. Certainly not triggering in the CI machines, which is slower than our own machines (where this is already very difficult to trigger). So this is worth waiting for doing actually right.

@fcecin fcecin marked this pull request as draft January 8, 2026 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants