fix(tests): prevent race condition in postgres driver tests#3673
fix(tests): prevent race condition in postgres driver tests#3673fcecin wants to merge 2 commits into
Conversation
* add PostgresDriver.waitForPartition * wait for a partition for 5s in archive driver tests setup
|
This PR may contain changes to database schema of one of the drivers. If you are introducing any changes to the schema, make sure the upgrade from the latest release to this change passes without any errors/issues. Please make sure the label |
|
You can find the image built from this PR at Built from 86e57cd |
Ivansete-status
left a comment
There was a problem hiding this comment.
LGTM! Thanks so much for it! 🙌
Just added some nitpicks.
Cheers.
|
|
||
| proc waitForPartition*( | ||
| self: PostgresDriver, timeout = chronos.seconds(5) | ||
| ): Future[ArchiveDriverResult[void]] {.async.} = |
There was a problem hiding this comment.
Shall we mention that this is meant to avoid flaky testing only?
There was a problem hiding this comment.
It's just a generic timed wrapper for containsAnyPartition so I think we would have to put the same warning there as well. I'm not sure we should be afraid of people using timed-out queries to "a partition exists" outside of tests or as part of helping an actual test to test something useful.
| var elapsed = chronos.milliseconds(0) | ||
|
|
||
| while elapsed < timeout: | ||
| if self.containsAnyPartition(): |
There was a problem hiding this comment.
I wonder if we should confirm that there is a valid partition for "now".
Each partition contains data for o'clock hours unix time.
There was a problem hiding this comment.
Yes, if we step out of "there is a partition" as a basic check we may step into second-guessing what the partition manager is doing w.r.t. time calc.
Maybe that's the actual solution to this. I should probably get more into what the partition manager actually does. Now I'm not sure I want to merge this fix as it is :-)
I think this PR will linger here a bit since this testing race condition is so difficult to reproduce. It's the opposite of urgent. Certainly not triggering in the CI machines, which is slower than our own machines (where this is already very difficult to trigger). So this is worth waiting for doing actually right.
Description
Messaging tests (such as tests/waku_archive/test_driver_postgres_query.nim) can temporarily fail if a new partition in the DB is not created in time. This is unlikely to happen when running the test suite, but it can happen (the insert can jump ahead of the partition creation by a few milliseconds).
Changes
PostgresDriver.waitForPartition()which waits for a partition to be available for a given timeout periodwaitForPartitionin the archive test suite and wait for a partition to be created before starting each messaging testIssue
Maintenance Y2026H1 #3686