Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Optimize IsNotNullExpr #11586

Merged
merged 6 commits into from
Jul 24, 2024
Merged

perf: Optimize IsNotNullExpr #11586

merged 6 commits into from
Jul 24, 2024

Conversation

andygrove
Copy link
Member

Which issue does this PR close?

N/A

Rationale for this change

is_not_null: column is all nulls
                        time:   [51.571 ns 51.892 ns 52.260 ns]
                        change: [-65.802% -65.423% -65.068%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

is_not_null: column is never null
                        time:   [98.535 ns 98.892 ns 99.258 ns]
                        change: [-34.450% -34.256% -34.098%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

is_not_null: column is mix of values and nulls
                        time:   [54.555 ns 54.634 ns 54.714 ns]
                        change: [-65.402% -65.352% -65.300%] (p = 0.00 < 0.05)
                        Performance has improved.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the physical-expr Physical Expressions label Jul 21, 2024
@andygrove andygrove marked this pull request as ready for review July 21, 2024 19:22
@andygrove andygrove requested review from Dandandan and comphead July 22, 2024 13:40
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove -- I think it would be good to add basic test coverage for Union but otherwise this looks great to me

};
compute::not(&is_null).map_err(Into::into)
} else {
compute::is_not_null(array.as_ref()).map_err(Into::into)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes faster because it calls a single kernel (compute::is_not_null) rather than 2 (is_null and not)?

Could we add some basic tests for union? Perhaps following the model in #11321 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes faster because it calls a single kernel (compute::is_not_null) rather than 2 (is_null and not)?

Yes, exactly. It avoids creating an interim vector that is then discarded.

Could we add some basic tests for union? Perhaps following the model in #11321 ?

We do already have at least one test for IS NOT NULL for union, that was added in #11321.

There is no functional change for union in this PR. The code in compute_is_not_null for union is copied from the compute_is_null method, and adds a call to not, so it is doing the same thing as before but the flow changed a little.

Union is the only case that this PR does not optimize for, because I didn't want to mess with the temporary workaround that is in place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb I pushed a change to simplify the PR and remove the duplicated union code. Let me know if that makes things clearer (or not).

@@ -117,6 +117,21 @@ pub(crate) fn compute_is_null(array: ArrayRef) -> Result<BooleanArray> {
}
}

/// workaround <https://github.com/apache/arrow-rs/issues/6017>,
/// this can be replaced with a direct call to `arrow::compute::is_not_null` once it's fixed.
pub(crate) fn compute_is_not_null(array: ArrayRef) -> Result<BooleanArray> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code looks good to me -- I think it also ends up supporting UnionArray in IS NOT NULL

cc @samuelcolvin who added this code in #11321

/// this can be replaced with a direct call to `arrow::compute::is_not_null` once it's fixed.
pub(crate) fn compute_is_not_null(array: ArrayRef) -> Result<BooleanArray> {
if array.as_any().is::<UnionArray>() {
compute::not(&compute_is_null(array)?).map_err(Into::into)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not optimize the union handling - this still calls is_null and not

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is lgtm, thanks @andygrove
For UNION all we can do a followup PR

Copy link
Contributor

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great to me -- thanks everyone!

@alamb alamb merged commit 20b298e into apache:main Jul 24, 2024
26 checks passed
@andygrove andygrove deleted the is-null branch July 24, 2024 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants