Skip to content

feat: support input changelog for primary key writes#318

Merged
JingsongLi merged 4 commits into
apache:mainfrom
QuakeWang:feat/input-changelog
May 20, 2026
Merged

feat: support input changelog for primary key writes#318
JingsongLi merged 4 commits into
apache:mainfrom
QuakeWang:feat/input-changelog

Conversation

@QuakeWang
Copy link
Copy Markdown
Contributor

@QuakeWang QuakeWang commented May 14, 2026

Purpose

Linked issue: close #255

This PR supports changelog-producer=input for primary-key table writes in the scoped Rust write path.

The implementation double-writes primary-key input rows into changelog files while keeping normal table data files deduplicated, then commits changelog files through separate changelog manifest metadata.

Brief change log

  • Add typed parsing for changelog-producer and changelog file options in CoreOptions.
  • Add PreparedFiles and propagate changelog files separately from normal data files.
  • Teach KeyValueFileWriter to write input changelog files from the full sorted input rows, while normal data files still use merge-engine selected rows.
  • Compute changelog DataFileMeta from changelog rows, including row count, key stats, sequence range, and retract row count.
  • Add CommitMessage::new_changelog_files and commit changelog entries into a separate changelog manifest list.
  • Populate snapshot changelogManifestList and changelogRecordCount.
  • Let overwrite ignore changelog files; dynamic partition overwrite derives touched partitions from new data entries only.
  • Preserve existing normal data manifest, index manifest, and table record count behavior.

Tests

  • cargo fmt --all -- --check
  • cargo clippy -p paimon --all-targets -- -D warnings
  • cargo test -p paimon changelog
  • cargo test -p paimon

API and Format

Documentation

QuakeWang added 3 commits May 14, 2026 13:14
Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>
Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>
Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>

# Conflicts:
#	crates/paimon/src/table/kv_file_writer.rs
@QuakeWang
Copy link
Copy Markdown
Contributor Author

@JingsongLi PTAL, Thanks

Comment thread crates/paimon/src/table/table_commit.rs Outdated
msg.bucket,
self.total_buckets,
file.clone(),
2,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 2? Maybe just 0?

Comment thread crates/paimon/src/table/table_write.rs Outdated
})
}

fn validate_changelog_write_options(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not validate these options.

  • Changelog none: default, do not validate it.
  • Changelog lookup and full-compact: producing by compaction, maybe in background.
  • POSTPONE_BUCKET ditto.

Remove this whole method.

Comment thread crates/paimon/src/table/table_write.rs Outdated
/// Close all writers and collect CommitMessages for use with TableCommit.
/// Writers are cleared after this call, allowing the TableWrite to be reused.
pub async fn prepare_commit(&mut self) -> Result<Vec<CommitMessage>> {
if self.is_overwrite && self.changelog_producer == ChangelogProducer::Input {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also ignore this.

Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>
@QuakeWang
Copy link
Copy Markdown
Contributor Author

@JingsongLi Thanks for the review. I’ve addressed the comments in the latest commit.

Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit 13e370a into apache:main May 20, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support changelog-producer is input

2 participants