Skip to content

Implement AnyRee#9959

Open
Rich-T-kid wants to merge 7 commits into
apache:mainfrom
Rich-T-kid:rich-T-kid/Ree-AnyArray
Open

Implement AnyRee#9959
Rich-T-kid wants to merge 7 commits into
apache:mainfrom
Rich-T-kid:rich-T-kid/Ree-AnyArray

Conversation

@Rich-T-kid
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

closes #9909.

Rationale for this change

makes the API simpler to work with & less code duplication

What changes are included in this PR?

Replace the per-key-type RunEndEncoded match arms in length/bit_length (arrow-string) and date_part (arrow-arith) with a single dispatch through the new AsArray::as_any_ree_opt/as_any_ree returning &dyn AnyRunEndArray, mirroring the existing dictionary handling. This removes the
now-unused ree_map! macro, leaving one trait-object code path for all Int16/Int32/Int64 run-end types.

Are these changes tested?

yes

Are there any user-facing changes?

no

@github-actions github-actions Bot added the arrow Changes to the arrow crate label May 11, 2026
Comment thread arrow-array/src/array/run_array.rs Outdated
Comment thread arrow-array/src/array/run_array.rs
Comment thread arrow-array/src/array/run_array.rs
@Rich-T-kid Rich-T-kid force-pushed the rich-T-kid/Ree-AnyArray branch from f398606 to d626ef8 Compare May 11, 2026 15:04
@Rich-T-kid Rich-T-kid force-pushed the rich-T-kid/Ree-AnyArray branch from d626ef8 to b7e7b73 Compare May 11, 2026 15:04
Comment thread arrow-arith/src/temporal.rs Outdated
Comment thread arrow-string/src/length.rs Outdated
Comment thread arrow-string/src/length.rs Outdated
Comment thread arrow-array/src/array/run_array.rs Outdated
Comment thread arrow-array/src/array/run_array.rs
@Rich-T-kid Rich-T-kid force-pushed the rich-T-kid/Ree-AnyArray branch from c352224 to f601fc0 Compare May 12, 2026 13:36
Comment thread arrow-array/src/array/run_array.rs Outdated
Comment thread arrow-array/src/array/run_array.rs Outdated
Comment thread arrow-array/src/array/run_array.rs Outdated
@Rich-T-kid Rich-T-kid force-pushed the rich-T-kid/Ree-AnyArray branch from bbe594c to 931003b Compare May 15, 2026 04:52
Copy link
Copy Markdown
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this is dragging out so long, came back with a fresh set of eyes. Just wanna ensure we're getting this correct from the start 😅

Comment thread arrow-array/src/array/run_array.rs Outdated
Comment on lines +797 to +806
/// Returns the run ends of this array as a primitive array.
fn run_ends(&self) -> ArrayRef;

/// Returns the values of this array.
fn values(&self) -> &Arc<dyn Array>;

/// Returns a new run-end encoded array with the given values, preserving the
/// existing run ends.
fn with_values(&self, values: ArrayRef) -> ArrayRef;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think of it, how do these APIs handle when the original RunArray is sliced? I think they return unsliced original data, and especially for run_ends it erases the logical slicing that may have been applied?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im a little confused by what you mean. Do you have an example of this case?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if theres a case like this:

    #[test]
    fn test123() {
        let run = Int32Array::from(vec![3, 6, 9]);
        let values = Int32Array::from(vec![1, 2, 3]);
        // [1, 1, 1, 2, 2, 2, 3, 3, 3]
        let array = RunArray::try_new(&run, &values).unwrap();

        // [2, 2, 2]
        // But still references same underlying arrays as above
        // Essentially:
        // {1, 1, 1, [2, 2, 2], 3, 3, 3}
        let array_sliced = array.slice(3, 3);

        let new_values = Int32Array::from(vec![7, 8, 9]);
        // [8, 8, 8]
        let new_array_sliced = array_sliced.with_values(Arc::new(new_values));
    }

When we slice the array we still hold on to the original values array; so array_sliced.values() would return [1, 2, 3] even though array_sliced only has value of 2 repeating. This isn't too big an issue, as I think at most its just a performance thing. For example:

let values = date_part(array.values(), part)?;
let new_array = array.with_values(values);

If array is sliced, then we could be doing extra work since we're calculating date_part for the values which aren't in the slice.

But for run_ends() method its harder to consider since we don't have a use case for it currently; is it a potential footgun that calling run_ends() on a sliced run array ignores any slicing 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the example you provided, shouldn't let array_sliced = array.slice(3, 3); ideally be represented as run_ends:[3],values:[2] instead of
{run_ends: [3, 6, 9], values: PrimitiveArray<Int32> [ 1, 2, 3, ]}
which is currently what array.slice() produces. due to REE.slice() copying the values arrays.

I think the best approach for now would be to remove the run_ends() function and add it in later if there is a need.
Im not sure how we would avoid the extra work for with_values() without updating the .slice() method. But this would require looking at all the CallSites for slice first to make sure this doesn't cause breaking changes. In theory this shouldn't causes regressions though it should be an implementation detail.

Comment thread arrow-array/src/array/run_array.rs Outdated
@Rich-T-kid Rich-T-kid requested review from Jefffrey and alamb May 24, 2026 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AnyRunArray trait

3 participants