Multithreaded replication, parallel row-copy with DML merge, frontier filter, and heartbeat lag throttle#2
Open
dnovitski wants to merge 3 commits into
Open
Multithreaded replication, parallel row-copy with DML merge, frontier filter, and heartbeat lag throttle#2dnovitski wants to merge 3 commits into
dnovitski wants to merge 3 commits into
Conversation
8b0acb3 to
9ab008d
Compare
…binlog sentinel (github#1637) * Prevent permanent worker deadlock when cutover times out waiting for binlog sentinel Buffer allEventsUpToLockProcessed to MaxRetries() so the applier's send always completes immediately even after waitForEventsUpToLock has timed out and exited. --------- Co-authored-by: meiji163 <meiji163@github.com>
dd5dfd9 to
8a5b648
Compare
8a5b648 to
3110d30
Compare
…ttle (#2) Performance optimizations for gh-ost that significantly speed up row-copy under high write load while keeping binlog lag bounded: - Parallel row-copy with dedicated connection pool and time-bounded drain - DML event merging within batches (INSERT/DELETE cancellation, UPDATE folding) - Frontier filter to skip DML events beyond copy frontier - Heartbeat lag throttle (--copy-max-lag-millis) for row-copy pacing - Adaptive drain budget and auto-tuning chunk size - Runtime-changeable --copy-concurrency and --copy-max-lag-millis - Fix multithreaded replication data inconsistency Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
3110d30 to
ca7577a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Performance Optimizations for gh-ost
This PR adds several performance optimizations to gh-ost that significantly speed up row-copy under high write load while keeping binlog lag bounded.
Features
1. Parallel Row-Copy (inspired by feat concurrent chunk data #1398)
--copy-concurrency=N— parallel row-copy workers (default 1)2. DML Event Merging (inspired by feat binlog apply optimization #1378)
--skip-dml-merge3. Frontier Filter (inspired by feat binlog apply optimization #1378)
--copy-concurrency=1(single-copy): with parallel copy, multiple chunks are in-flight simultaneously so the frontier position is not a reliable boundary — in-flight chunks may not have committed yet, making it unsafe to skip events beyond the frontierSELECTqueries may not yet see the data from skipped events, causing silent data loss--skip-dml-frontier-filter4. Heartbeat Lag Throttle
--copy-max-lag-millis(default 60000) prevents unbounded binlog lag growth during parallel row-copy0to disable (maximum copy speed, unbounded lag)--max-lag-millisRuntime-Changeable Flags
copy-concurrency=<N>— change parallel copy workers at runtime (range 1-32)copy-max-lag-millis=<N>— change heartbeat lag threshold at runtime (0 = disable)Bug Fixes
buildDMLEventQueryDML mutation: UPDATE operations on unique-key tables no longer corrupt the shared DMLEvent objectcopyRowsQueuechannel combined with HeartbeatLag sentinel value (before first heartbeat) caused copy to never get execution turns. Fixed with buffered channel and sentinel filteringBenchmark Results (4-thread sysbench, 100K rows, 15-min runs)
Key takeaways:
HeartbeatLag Analysis
Without the throttle, binlog lag grows unboundedly because the bounded drain (50ms budget) gives row-copy more turns at the expense of DML processing. The lag throttle resolves this:
New CLI Flags
--copy-max-lag-millis--skip-dml-merge--skip-dml-frontier-filterTesting
merge-dml-events)parallel-rowcopy-lag-throttle)