debezium/dbz#1530 Support connector generation and epoch incrementing#269
Conversation
Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>
Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>
|
@jpechane this is ready for review |
|
@twthorn Thanks for the PR. I wonder if i is something that should be implemented as connector config or a connector-specific signal, WDYT? |
|
@jpechane thanks for the review. I think that a config fits better here as its inherently persistent. Consider a busy shard -80 (epoch: 1) and idle shard 80- (epoch: 3). We trigger a new connector generation with a signal. the busy shard -80 would soon update its state (epoch: 2). If there has been no activity for 80- in several days or a week the signal record may fall out of retention of its torage medium (e.g., kafka). Then when a record is finally received for the idle shard 80- the epoch won't be incremented correctly (epoch: 3 remains, should be 4). By making the signal medium retention infinite, users could get around this, but that's a point of failure where the user may misconfigure. Since it's permanent in source control and a history, I think it makes sense as a config. |
|
@jpechane let me know your thoughts on this, thanks! |
Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>
|
@Naros addressed all comments. Can you take another look when you get the chance? Thanks! |
|
@Naros What do you think of config vs signal? I have not heard back from Jiri, but I believe config is the appropriate fit here. |
|
Hi @twthorn, I can see the advantages and disadvantages of both, but I'm not overly familiar with the internal workings of Vitess and its shard system to say either way. From a general point of view, the signal approach paired with a With the configuration-style approach, everything is driven from a single place, with no moving parts. It does not require database write permissions, nor does it become a concern in the low activity scenario you mentioned. I think the biggest downside of the configuration approach is that it triggers a connector restart, whereas the benefit of the signal might be that it avoids downtime or restarts. So suffice it to say, with my limited understanding, I can see how the configuration approach looks better. But I would appreciate it if you could evaluate my comment above with signal + heartbeats and explain how that might look or work, or why it doesn't, for my own edification. |
|
@twthorn I am fine with merging the current implementation. Still I'd like to see your opnion about approach proposed in debezium/dbz#1602 |
|
Thanks for the feedback @Naros and @jpechane.
@Naros Signal + heartbeats could work. However, we do not require users to enable heartbeats in order to run Debezium Vitess Connector. In order to enable heartbeats, users need one of the newest Vitess versions to get the new feature, and it must be enabled on both Vitess & Debezium sides. Thus, I don't think it's reasonable to make correctness require heartbeats as most users do not have it enabled.
@jpechane I replied in that issue thread, hope the context helps. Overall, I think config makes the most sense with the current requirements and usage patterns. However, I can see two ways of such a config approach working Connector generation (this PR)Decouple incrementing epochs from the logic that requires them to be incremented. It allows operators to increment them when they see fit. The flow is:
Cons: users must change two configs (some manual error risk) Automated epoch handling specific to binlog chunking #268Store a specific property in state: chunking_enabled. Used to track if our rank ever falls backward (requires epoch bump to maintain correct ordering)
Cons: it is tightly coupled, if another change requires epochs to be bumped, it must be added to state as well Overall I prefer the decoupled approach which is what this PR adds. Let me know if you all think one makes more sense. I will also add some more details of why we need this in the PR description |
|
@twthorn Thanks for he thorough explanantion. I think this is good to go as is and we can think about future updates when needed. Good job! |
|
@twthorn Please make sure not only the option but the process is documented as well, thanks! |
Description
Add the connector generation config, used to manually update the epoch of a connector's shards.
We need this to allow users to migrate to transaction chunking in this PR #268
This can also be useful for maintenance in general in case the ordering semantics are ever changed, generation is incremented to trigger an epoch bump.
Why Do We Need Manual Epoch Increments
The only known use case is in #268. It is necessary to preserve metadata correctness when enabling that feature.
As taken from Debezium docs:
So what chunking does is change how the transaction rank field is computed.
Let
ibe transactionikbe the rank computed for transactionkpkA - col1: v1denote setting col1 to the valuev1Without transaction chunking: Since an entire transaction is received at once, we can compute the rank from the GTID of the transaction (the last event received in the transaction)
t1 -> r1, pkA - col1: v1, total_order = 3
t2 -> r2, pkA - col1: v2, total_order = 2
t3 -> r3, pkA - col1: v3, total_order = 1
With transaction chunking: Instead we receive chunks of a transaction. Only the last chunk will contain the GTID of the transaction. Thus the only GTID available to compute rank is the one of the preceding transaction (the only possible alternative would be to buffer until we receive the last chunk but that defeats the purpose of chunking). Thus the ranks are computed as:
t1 -> r0, pkA - col1: v1, total_order = 3
t2 -> r1, pkA - col1: v2, total_order = 2
t3 -> r2, pkA - col1: v3, total_order = 1
So in the case of enabling transaction chunking there is one of two cases:
Case 1: Sent messages had offsets committed
t1 -> r1, pkA - col1: v1, total_order = 3
t2 -> r2, pkA - col1: v2, total_order = 2 (offsets committed)
<restart, enable chunking>
t3 -> r2, pkA - col1: v3, total_order = 1
Problem: by the previous algorithm, t2 r2 with order 2 wins setting col1, stale value finalized as v2
Case 2: Sent message did not have offset comitted:
t1 -> r1, pkA - col1: v1, total_order = 3
t2 -> r2, pkA - col1: v2, total_order = 2 (offsets not committed)
<restart, enable chunking>
t2 -> r1, pkA - col1: v2, total_order = 2
t3 -> r2, pkA - col1: v3, total_order = 1
Problem: by the previous algorithm, two stale values are incorrectly chosen:
Overall, the downstream system cannot correctly use metadata to determine which change is the latest for pk1. At a minimum values will be tied (which may be broken with offset check assuming no concurrent kafka re-partition). But with particular transaction ordering values (decreasing rather than increasing like offsets), it may even wrongfully set to old values (stale and incorrect indefinitely until a new update comes along for the pkey)
Fixes debezium/dbz#1530