Skip to content

HIVE-29598: Fix vectorized outer join wrong results due to stale scratch column values#6486

Merged
soumyakanti3578 merged 10 commits into
apache:masterfrom
ryukobayashi:HIVE-29598
May 21, 2026
Merged

HIVE-29598: Fix vectorized outer join wrong results due to stale scratch column values#6486
soumyakanti3578 merged 10 commits into
apache:masterfrom
ryukobayashi:HIVE-29598

Conversation

@ryukobayashi
Copy link
Copy Markdown
Contributor

@ryukobayashi ryukobayashi commented May 15, 2026

What changes were proposed in this pull request?

In vectorized outer join, generateOuterNulls() and generateOuterNullsRepeatedAll() set isNull[i] = true on scratch columns but leave vector[i] untouched. When hive.vectorized.reuse.scratch.columns=true (the default), a scratch column slot
freed after an expression evaluation (e.g. CastStringToLong) can be reused for the outer join's null-marking column. After reset() clears isNull[], the expression overwrites vector[i] with a fresh value (e.g. 2025). Later, generateOuterNulls()
sets isNull[i] = true without clearing vector[i], leaving a stale non-zero value.

Downstream operators such as ColOrCol read vector[i] directly to distinguish "false" (== 0) from "null" (!= 0). The stale value causes null rows to be misinterpreted as "true", producing wrong OR/AND/CASE WHEN results.

The fix adds clearVectorValue(), called whenever isNull[i] is set to true in the outer join null-marking paths, zeroing vector[i] for all supported column vector types (LongColumnVector, DoubleColumnVector, BytesColumnVector,
TimestampColumnVector, IntervalDayTimeColumnVector).

Why are the changes needed?

Without the fix, vectorized outer joins silently return wrong results when scratch column reuse is enabled (the default). The bug is non-obvious because it only triggers when a specific combination of conditions is met: a type-casting expression allocates a scratch column that is later reused for the outer join's null-marking column, and the join result is consumed by a boolean operator that reads the raw vector value for null discrimination. Users have no indication that results are wrong; workarounds require disabling vectorization entirely (hive.vectorization.enabled=false) or disabling scratch column reuse (hive.vectorized.reuse.scratch.columns=false), both of which carry a significant performance cost.

Does this PR introduce any user-facing change?

No

How was this patch tested?

I added qtest.

Copy link
Copy Markdown
Contributor

@soumyakanti3578 soumyakanti3578 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vectorization code can be tricky and brittle. To ensure there are no unintended consequences of this change, could you please add some tests? A minimal reproducer as a qtest is essential, and unit tests for the vector clearing operation across different ColumnVector types would also be valuable.

@konstantinb
Copy link
Copy Markdown
Contributor

@ryukobayashi could you please take a look at my suggestions to your approach: ryukobayashi#1 ?

The PR is a bit better reproducing the actual problem and incorporates multiple suggestions from @soumyakanti3578

HIVE-29598 follow-up: virtualize clearValue + replace qtest
@ryukobayashi
Copy link
Copy Markdown
Contributor Author

@konstantinb Thanks, I accept your proposal because it makes sense.

@konstantinb
Copy link
Copy Markdown
Contributor

@ryukobayashi one more tweak: ryukobayashi#2 which addresses SQ feedback and trims excessive comments/hive config flags the test uses.

The main module still seems to have at least three code paths unfixed, but those are, apparently, impossible to hit with current CBO rewrites. I am unsure if we should leave those as they are now or fix preemptively

HIVE-29598: SQ feedback + comments' cleanup
@ryukobayashi
Copy link
Copy Markdown
Contributor Author

@konstantinb Merged. If the corrections are minor, I think it's okay to do them just to be safe.

Copy link
Copy Markdown
Contributor

@konstantinb konstantinb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added suggestions to address SonarQube feedback

ryukobayashi and others added 2 commits May 21, 2026 15:55
…VectorMapJoinOuterGenerateResultOperator.java

Co-authored-by: konstantinb <konstantinb@users.noreply.github.com>
…VectorMapJoinOuterGenerateResultOperator.java

Co-authored-by: konstantinb <konstantinb@users.noreply.github.com>
…rameterized tests; add generateOuterNullsRepeatedAll parameterized test; verify neighbor slot unchanged in TestBytesColumnVector
@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@soumyakanti3578 soumyakanti3578 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@soumyakanti3578 soumyakanti3578 merged commit 9e316c3 into apache:master May 21, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants