fix: resolve MTR data inconsistency caused by binlog rotation by dnovitski · Pull Request #1 · dnovitski/gh-ost

dnovitski · 2026-04-29T00:43:48Z

Summary

Fixes intermittent data inconsistency in the multithreaded replication (MTR) coordinator introduced in github/gh-ost#1454.

Root Cause

MySQL's logical clock (last_committed, sequence_number) is per-binlog-file. When max_binlog_size triggers a binlog rotation, sequence_number resets to 1. However, the coordinator's lowWaterMark (lwm) was never reset — it retained the old file's high value (e.g., 65553). After rotation, all WaitForTransaction(lastCommitted) checks passed immediately (lwm >= lastCommitted trivially true), causing transactions from the new binlog file to execute out of order.

Example of the bug

Before rotation: lwm = 65553
After rotation:  sequence numbers restart at 1, 2, 3, ...
Transaction with lastCommitted=5 → check: 65553 >= 5 → TRUE (should wait!)

This caused dependent transactions to execute concurrently, resulting in wrong final values (e.g., k=5046 instead of k=5047).

Bugs Fixed

Bug 1: Binlog rotation state reset (THE ROOT CAUSE)

Problem: lowWaterMark never reset on binlog rotation → stale lwm allows out-of-order execution
Fix: Initialize lwm to -1 (sentinel for "uninitialized"). On RotateEvent, drain all busy workers, then reset lwm=-1 and clear completedJobs/waitingJobs maps. The drain creates a barrier only at binlog file boundaries (acceptable overhead).

Bug 2: Silent error swallowing in DML apply

Problem: applyDMLEvents() errors were logged but silently discarded; MarkTransactionCompleted was called regardless, corrupting dependency tracking
Fix: Retry InnoDB deadlocks (error 1213) and lock wait timeouts (error 1205) with jittered exponential backoff (up to 100 retries, matching MySQL's slave_transaction_retries). Propagate fatal (non-retryable) errors via a broadcast channel (failedCh).

Bug 3: Wait channel deadlock on error paths

Problem: WaitForTransaction used unbuffered channels. If a waiter exited early via failedCh, the subsequent MarkTransactionCompleted send would block forever.
Fix: Use buffered channels (capacity 1) so the send never blocks.

Bug 4: Data race on lwm read in RotateEvent handler

Problem: if c.lowWaterMark >= 0 was read without holding c.mu, racing with concurrent MarkTransactionCompleted calls.
Fix: Guard the read with c.mu.Lock()/c.mu.Unlock().

Verification

20+ consistency test iterations passed at rate=1200 trx/s, 4 workers, 90s sysbench load (was ~60% failure rate before fix)
Independent verification by second agent: 48-minute stress test, 10 binlog rotations, 0 data mismatches
go build ./... ✅, go vet ./... ✅

Performance: MTR vs Baseline

Benchmarked with 200K rows, 1000 trx/s sysbench write load for 90 seconds:

Configuration	Total Time	DML Events/s	Row-copy starvation
Baseline (no MTR)	166s	1,463/s	~150s (0 rows copied during load!)
MTR, 4 workers	135s	2,049/s	~120s
MTR, 4 workers, batch=50	150s	2,156/s	~135s

Key finding: MTR provides ~19% improvement in total migration time. The fundamental bottleneck is executeWriteFuncs which calls ProcessEventsUntilDrained() before each row-copy chunk — under high write load, the event queue fills continuously and row-copy gets starved regardless of worker count. MTR helps by draining the queue faster with parallel workers.

Files Changed

go/logic/coordinator.go — 189 insertions, 42 deletions

Known Limitation

buildDMLEventQuery in applier.go mutates dmlEvent.DML for unique-key UPDATE operations (sets to DeleteDML then InsertDML, never restores). This is a pre-existing bug that does not affect sysbench workloads (PK-only) but could cause issues with unique-key modifications. Not addressed in this PR.

Adds parallel DML event processing via a coordinator that manages worker goroutines using MySQL's LOGICAL_CLOCK dependency tracking. Key fixes for data inconsistency: - Reset lowWaterMark on binlog rotation (sequence numbers are per-file) - Drain all workers before resetting coordinator state - Retry InnoDB deadlocks with jittered exponential backoff - Propagate fatal errors via broadcast channel - Use buffered wait channels to prevent deadlocks on error paths - Guard all lowWaterMark reads with mutex - Remove dead commented-out legacy EventsStreamer code - Add deterministic rotation regression tests Co-authored-by: meiji163 <meiji163@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

dnovitski mentioned this pull request Apr 29, 2026

Multithreaded replication WIP github/gh-ost#1454

Draft

2 tasks

dnovitski force-pushed the meiji163/parallel-repl branch 5 times, most recently from fc8e83e to ad143d3 Compare April 29, 2026 01:27

dnovitski force-pushed the meiji163/parallel-repl branch from ad143d3 to 4b099b0 Compare April 29, 2026 01:37

This was referenced Apr 29, 2026

Multithreaded replication, parallel row-copy with DML merge, frontier filter, and heartbeat lag throttle #2

Open

Multithreaded replication, parallel row-copy with DML merge, frontier filter, and heartbeat lag throttle github/gh-ost#1665

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve MTR data inconsistency caused by binlog rotation#1

fix: resolve MTR data inconsistency caused by binlog rotation#1
dnovitski wants to merge 1 commit into
masterfrom
meiji163/parallel-repl

dnovitski commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dnovitski commented Apr 29, 2026

Summary

Root Cause

Example of the bug

Bugs Fixed

Bug 1: Binlog rotation state reset (THE ROOT CAUSE)

Bug 2: Silent error swallowing in DML apply

Bug 3: Wait channel deadlock on error paths

Bug 4: Data race on lwm read in RotateEvent handler

Verification

Performance: MTR vs Baseline

Files Changed

Known Limitation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant