[SPARK-56897][SQL] Reduce per-value allocations in DELTA_BYTE_ARRAY Parquet decoder by iemejia · Pull Request #55924 · apache/spark

iemejia · 2026-05-17T00:41:51Z

What changes were proposed in this pull request?

Reduces per-value heap allocations in VectorizedDeltaByteArrayReader (the Parquet DELTA_BYTE_ARRAY vectorized decoder) by replacing ByteBuffer-based state tracking with a reusable byte[] buffer.

Key changes:

Replace ByteBuffer previous with byte[] prevBuf + int prevLen: The DELTA_BYTE_ARRAY encoding exploits prefix sharing between consecutive values. The old code materialized each decoded value as a ByteBuffer obtained from arrayData.getByteBuffer() (which allocates ~48 bytes per value via ByteBuffer.wrap()). The new approach uses a reusable byte[] buffer where the prefix bytes from the previous iteration are already in place at the start -- only the suffix portion needs to be written.
Add getSuffixInto() to VectorizedDeltaLengthByteArrayReader: New method that reads suffix bytes directly into a caller-supplied byte[] via InputStream.read(), avoiding the ByteBuffer allocation that getBytes() / in.slice() performs. Also adds getSuffixLength() for cases where only the length is needed.
Rewrite skipBinary() to use prevBuf directly: The old implementation was particularly wasteful -- it maintained two WritableColumnVector instances (binaryValVector and tempBinaryValVector), reset one per skipped value, and swapped them after each iteration. The new implementation simply assembles into prevBuf without touching any column vectors.
Remove tempBinaryValVector field: No longer needed after the skipBinary rewrite.

Why are the changes needed?

The DELTA_BYTE_ARRAY decoder allocated 2 ByteBuffer objects per decoded value (one from in.slice() for the suffix, one from arrayData.getByteBuffer() for the previous-value reference). For a typical 4096-value page, this produced ~8K short-lived objects (~384KB) that stress the young-generation GC.

The skipBinary path was even worse: it reset and rebuilt column vectors per skipped value, involving reserve(), Arrays.copyOf(), and putArray() overhead just to maintain the previous-value state.

After this change, the hot path performs zero per-value heap allocations (the prevBuf array is reused across all values and only grows if a value exceeds its current capacity).

Normalized benchmark improvements (using Variant reads as cross-hardware baseline):

readBinary: 1.1-1.3x faster across overlap shapes
skipBinary: 1.5-1.9x faster (largest gains from eliminating column vector reset/swap)
readBinary(len) single-value: ~1.2x faster

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests: ParquetDeltaByteArrayEncodingSuite (14 tests), ParquetDeltaLengthByteArrayEncodingSuite, ParquetEncodingSuite -- all pass.
Benchmark: VectorizedDeltaReaderBenchmark Group C (DELTA_BYTE_ARRAY) with four overlap shapes (no overlap, half overlap, full overlap, len=64).

Was this patch authored or co-authored using generative AI tooling?

Generated-by: OpenCode with Claude Opus (claude-opus-4.6)

…arquet decoder

[SPARK-56897][SQL] Reduce per-value allocations in DELTA_BYTE_ARRAY P…

da5fe35

…arquet decoder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56897][SQL] Reduce per-value allocations in DELTA_BYTE_ARRAY Parquet decoder#55924

[SPARK-56897][SQL] Reduce per-value allocations in DELTA_BYTE_ARRAY Parquet decoder#55924
iemejia wants to merge 1 commit into
apache:masterfrom
iemejia:SPARK-delta-byte-array

iemejia commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

iemejia commented May 17, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant