Skip to content

fix: resolve MTR data inconsistency caused by binlog rotation#1

Open
dnovitski wants to merge 1 commit into
masterfrom
meiji163/parallel-repl
Open

fix: resolve MTR data inconsistency caused by binlog rotation#1
dnovitski wants to merge 1 commit into
masterfrom
meiji163/parallel-repl

Conversation

@dnovitski
Copy link
Copy Markdown
Owner

Summary

Fixes intermittent data inconsistency in the multithreaded replication (MTR) coordinator introduced in github/gh-ost#1454.

Root Cause

MySQL's logical clock (last_committed, sequence_number) is per-binlog-file. When max_binlog_size triggers a binlog rotation, sequence_number resets to 1. However, the coordinator's lowWaterMark (lwm) was never reset — it retained the old file's high value (e.g., 65553). After rotation, all WaitForTransaction(lastCommitted) checks passed immediately (lwm >= lastCommitted trivially true), causing transactions from the new binlog file to execute out of order.

Example of the bug

Before rotation: lwm = 65553
After rotation:  sequence numbers restart at 1, 2, 3, ...
Transaction with lastCommitted=5 → check: 65553 >= 5 → TRUE (should wait!)

This caused dependent transactions to execute concurrently, resulting in wrong final values (e.g., k=5046 instead of k=5047).

Bugs Fixed

Bug 1: Binlog rotation state reset (THE ROOT CAUSE)

  • Problem: lowWaterMark never reset on binlog rotation → stale lwm allows out-of-order execution
  • Fix: Initialize lwm to -1 (sentinel for "uninitialized"). On RotateEvent, drain all busy workers, then reset lwm=-1 and clear completedJobs/waitingJobs maps. The drain creates a barrier only at binlog file boundaries (acceptable overhead).

Bug 2: Silent error swallowing in DML apply

  • Problem: applyDMLEvents() errors were logged but silently discarded; MarkTransactionCompleted was called regardless, corrupting dependency tracking
  • Fix: Retry InnoDB deadlocks (error 1213) and lock wait timeouts (error 1205) with jittered exponential backoff (up to 100 retries, matching MySQL's slave_transaction_retries). Propagate fatal (non-retryable) errors via a broadcast channel (failedCh).

Bug 3: Wait channel deadlock on error paths

  • Problem: WaitForTransaction used unbuffered channels. If a waiter exited early via failedCh, the subsequent MarkTransactionCompleted send would block forever.
  • Fix: Use buffered channels (capacity 1) so the send never blocks.

Bug 4: Data race on lwm read in RotateEvent handler

  • Problem: if c.lowWaterMark >= 0 was read without holding c.mu, racing with concurrent MarkTransactionCompleted calls.
  • Fix: Guard the read with c.mu.Lock()/c.mu.Unlock().

Verification

  • 20+ consistency test iterations passed at rate=1200 trx/s, 4 workers, 90s sysbench load (was ~60% failure rate before fix)
  • Independent verification by second agent: 48-minute stress test, 10 binlog rotations, 0 data mismatches
  • go build ./... ✅, go vet ./...

Performance: MTR vs Baseline

Benchmarked with 200K rows, 1000 trx/s sysbench write load for 90 seconds:

Configuration Total Time DML Events/s Row-copy starvation
Baseline (no MTR) 166s 1,463/s ~150s (0 rows copied during load!)
MTR, 4 workers 135s 2,049/s ~120s
MTR, 4 workers, batch=50 150s 2,156/s ~135s

Key finding: MTR provides ~19% improvement in total migration time. The fundamental bottleneck is executeWriteFuncs which calls ProcessEventsUntilDrained() before each row-copy chunk — under high write load, the event queue fills continuously and row-copy gets starved regardless of worker count. MTR helps by draining the queue faster with parallel workers.

Files Changed

  • go/logic/coordinator.go — 189 insertions, 42 deletions

Known Limitation

buildDMLEventQuery in applier.go mutates dmlEvent.DML for unique-key UPDATE operations (sets to DeleteDML then InsertDML, never restores). This is a pre-existing bug that does not affect sysbench workloads (PK-only) but could cause issues with unique-key modifications. Not addressed in this PR.

@dnovitski dnovitski force-pushed the meiji163/parallel-repl branch 5 times, most recently from fc8e83e to ad143d3 Compare April 29, 2026 01:27
Adds parallel DML event processing via a coordinator that manages
worker goroutines using MySQL's LOGICAL_CLOCK dependency tracking.

Key fixes for data inconsistency:
- Reset lowWaterMark on binlog rotation (sequence numbers are per-file)
- Drain all workers before resetting coordinator state
- Retry InnoDB deadlocks with jittered exponential backoff
- Propagate fatal errors via broadcast channel
- Use buffered wait channels to prevent deadlocks on error paths
- Guard all lowWaterMark reads with mutex
- Remove dead commented-out legacy EventsStreamer code
- Add deterministic rotation regression tests

Co-authored-by: meiji163 <meiji163@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant