Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CommonSubexprEliminate identifier management (10% faster planning) #10473

Merged
merged 10 commits into from
Jun 24, 2024

Conversation

peter-toth
Copy link
Contributor

@peter-toth peter-toth commented May 12, 2024

Which issue does this PR close?

Closes #10426.

Rationale for this change

Now that #10832, #10939 and #10835 have landed, this PR adds 3 optimizations to CSE:

  1. Currently the CommonSubexprEliminate uses sting identifiers to encode exression trees (e.g. the expression col("a") + 1 is encoded as "{a + Int32(1)|{Int32(1)}|{a}}") which can cause performance problems due to identidiers are copied and concatenated many times during CSE.
    This PR changes the implementation of identifiers to:

    struct Identifier<'n> {
        hash: u64,
        expr: &'n Expr,
    }

    This struct contains:

    • A pre-calculated hash of the expression tree, that is caulated during the bottom-up phase of the first, visiting traversal effectively, without recalculating hashes of subtrees.
    • A reference to the expression tree to avoid issues due to hash collisions.
  2. Moves is_volatile() check of expressions out of the first, visiting traversal. As is_volatile() check is an expression tree check implemented with Expr::exists() checking the whole expression tree only once is enough and more effective than checking all of its subtrees again and again during a traversal.

  3. Modifies expr_to_identifier() and to_arrays() to return a boolean flag if executing the second, rewriting traversal is needed or we can skip it.

What changes are included in this PR?

  • New Identifier implementation.
  • New Expr::hash_node() method to build the hashcode of a node's direct content (without the children) to be able to calculate the hash of an expression tree effectively.
  • Modified expr_to_identifier() and to_arrays() methods to return a boolean flag besides the IdArray.

Are these changes tested?

Yes, with existing UTs.

Are there any user-facing changes?

No.

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels May 12, 2024
@peter-toth peter-toth force-pushed the better-cse-identifier branch 4 times, most recently from 878cc04 to 4b0608c Compare May 14, 2024 11:04
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking very exciting @peter-toth

@@ -204,6 +214,24 @@ pub trait TreeNode: Sized {
apply_impl(self, &mut f)
}

fn apply_ref<'n, F: FnMut(&'n Self) -> Result<TreeNodeRecursion>>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API would be helpful in other areas to avoid cloning -- I am very much in favor of adding it

datafusion/optimizer/src/common_subexpr_eliminate.rs Outdated Show resolved Hide resolved
@alamb
Copy link
Contributor

alamb commented May 15, 2024

I think @peter-toth plans to break this PR up into smaller ones, so marking it as a draft to make it clear it isn't waiting on more feedback. If I am mistaken, please let me know

@alamb alamb marked this pull request as draft May 15, 2024 19:10
@peter-toth
Copy link
Contributor Author

I think @peter-toth plans to break this PR up into smaller ones, so marking it as a draft to make it clear it isn't waiting on more feedback. If I am mistaken, please let me know

Yes, here is the first part that adds the new TreeNode APIs: #10543

@peter-toth peter-toth force-pushed the better-cse-identifier branch from 9a9f4fe to fc71133 Compare June 21, 2024 08:44
@peter-toth
Copy link
Contributor Author

peter-toth commented Jun 21, 2024

@alamb, can you please help me with that MSRV failure? I don't know what could be the source of the failure and if I try running cargo msrv verify locally it fails with:

% cargo msrv verify
Fetching index
Unable to find key 'package.rust-version' (or 'package.metadata.msrv') in '/Users/ptoth/git/apache/arrow-datafusion/Cargo.toml'

assert_eq!(window_exprs.len(), arrays_per_window.len());
let num_window_exprs = window_exprs.len();
let rewritten_window_exprs = self.rewrite_expr(
window_exprs.clone(),
Copy link
Contributor Author

@peter-toth peter-toth Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extra clone might look bad at first, but technically this is due to we use references to original expressions in Identifiers so we have to keep the original expressions intact.

  • The reason why this is not worse than before is because previously the Identifiers were stringified expression trees (so they were kind of clones of the original expressions).
  • The reason why this can be better than before is because we introduce the found_common flag above that allows skipping cloning here when it make no sense to trigger the second, rewriting traversal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
window_exprs.clone(),
// Must clone as Identifiers use references to original expressions
// so we have to keep the original expressions intact.
window_exprs.clone(),

in terms of the cost of the extra clone, I think the performance results speak for themselves

if group_found_common || aggr_found_common {
// rewrite both group exprs and aggr_expr
let rewritten = self.rewrite_expr(
vec![group_expr.clone(), aggr_expr.clone()],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra clone, but the reason is the same as https://github.com/apache/datafusion/pull/10473/files#r1648985272

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this only happens when we are actually doing a CSE rewrite (and not on all plans) I think it is fine

if aggr_found_common {
let mut common_exprs = CommonExprs::new();
let mut rewritten_exprs = self.rewrite_exprs_list(
vec![new_aggr_expr.clone()],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra clone, but the reason is the same as https://github.com/apache/datafusion/pull/10473/files#r1648985272

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can add a comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in ee61224.


if found_common {
let rewritten = self.rewrite_expr(
vec![expr.clone()],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra clone, but the reason is the same as https://github.com/apache/datafusion/pull/10473/files#r1648985272


self.id_array[down_index].0 = self.up_index;
if !self.expr_mask.ignores(expr) {
self.id_array[down_index].1.clone_from(&expr_id);
let count = self.expr_stats.entry(expr_id.clone()).or_insert(0);
Copy link
Contributor Author

@peter-toth peter-toth Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer need to clone Identifiers as they are Copy.

@peter-toth
Copy link
Contributor Author

peter-toth commented Jun 21, 2024

This PR is more or less ready for review, tests are passing except for the MSRV.
I focused only on the 3 performance improvements and deliberately kept the code as close to the original as possible to ease review. I plan to open a follow-up PR to tackle remaining TODOs and refactor CSE code a bit for better readability.

Local benchmarks show good perfroamance improvements, but @alamb please confirm it with your standard setup.

% critcmp main better-cse-identifier
group                                         better-cse-identifier                   main
-----                                         ---------------------                   ----
logical_aggregate_with_join                   1.00    529.3±8.70µs        ? ?/sec     1.00    528.4±6.17µs        ? ?/sec
logical_plan_tpcds_all                        1.00     71.3±2.74ms        ? ?/sec     1.06    75.4±16.47ms        ? ?/sec
logical_plan_tpch_all                         1.00      7.3±0.19ms        ? ?/sec     1.09      7.9±2.34ms        ? ?/sec
logical_select_all_from_1000                  1.02     15.9±2.00ms        ? ?/sec     1.00     15.6±1.27ms        ? ?/sec
logical_select_one_from_700                   1.00    399.9±3.73µs        ? ?/sec     1.02    408.3±4.29µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00   422.7±92.70µs        ? ?/sec     1.00   423.7±88.41µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00    380.5±5.11µs        ? ?/sec     1.01    385.3±4.22µs        ? ?/sec
physical_plan_tpcds_all                       1.00   536.8±11.44ms        ? ?/sec     1.09   586.2±20.88ms        ? ?/sec
physical_plan_tpch_all                        1.01    38.5±12.88ms        ? ?/sec     1.00     38.1±0.59ms        ? ?/sec
physical_plan_tpch_q1                         1.00  1108.3±12.50µs        ? ?/sec     1.43  1585.0±12.90µs        ? ?/sec
physical_plan_tpch_q10                        1.00  1598.6±219.13µs        ? ?/sec    1.15  1842.3±280.13µs        ? ?/sec
physical_plan_tpch_q11                        1.00  1339.7±14.58µs        ? ?/sec     1.15  1540.2±247.46µs        ? ?/sec
physical_plan_tpch_q12                        1.00  1117.3±63.22µs        ? ?/sec     1.12  1252.1±17.87µs        ? ?/sec
physical_plan_tpch_q13                        1.00   769.6±19.54µs        ? ?/sec     1.12   864.0±12.96µs        ? ?/sec
physical_plan_tpch_q14                        1.00  930.6±216.63µs        ? ?/sec     1.16  1081.1±125.78µs        ? ?/sec
physical_plan_tpch_q16                        1.00  1331.4±38.36µs        ? ?/sec     1.15  1534.8±261.21µs        ? ?/sec
physical_plan_tpch_q17                        1.00  1209.8±29.88µs        ? ?/sec     1.09  1322.0±17.01µs        ? ?/sec
physical_plan_tpch_q18                        1.00  1532.7±240.18µs        ? ?/sec    1.04  1593.0±20.68µs        ? ?/sec
physical_plan_tpch_q19                        1.00      2.5±0.08ms        ? ?/sec     1.06      2.7±0.04ms        ? ?/sec
physical_plan_tpch_q2                         1.00      3.2±0.70ms        ? ?/sec     1.06      3.4±0.72ms        ? ?/sec
physical_plan_tpch_q20                        1.00  1642.3±16.61µs        ? ?/sec     1.12  1845.0±302.12µs        ? ?/sec
physical_plan_tpch_q21                        1.00      2.6±0.65ms        ? ?/sec     1.07      2.7±0.64ms        ? ?/sec
physical_plan_tpch_q22                        1.00  1196.6±101.79µs        ? ?/sec    1.08  1292.8±15.01µs        ? ?/sec
physical_plan_tpch_q3                         1.00  1105.8±28.61µs        ? ?/sec     1.23  1358.9±236.19µs        ? ?/sec
physical_plan_tpch_q4                         1.00   852.5±13.28µs        ? ?/sec     1.12  957.4±161.05µs        ? ?/sec
physical_plan_tpch_q5                         1.00  1685.6±23.24µs        ? ?/sec     1.10  1857.1±21.46µs        ? ?/sec
physical_plan_tpch_q6                         1.00  642.0±188.39µs        ? ?/sec     1.01   647.4±12.04µs        ? ?/sec
physical_plan_tpch_q7                         1.00      2.2±0.45ms        ? ?/sec     1.08      2.4±0.54ms        ? ?/sec
physical_plan_tpch_q8                         1.00      2.7±0.05ms        ? ?/sec     1.08      2.9±0.06ms        ? ?/sec
physical_plan_tpch_q9                         1.00  1964.3±55.93µs        ? ?/sec     1.08      2.1±0.04ms        ? ?/sec

@peter-toth peter-toth marked this pull request as ready for review June 21, 2024 15:39
@alamb
Copy link
Contributor

alamb commented Jun 21, 2024

I am starting the benchmark run now -- I'll report back here and give this PR a look as soon as possible (but maybe not until tomorrow)

Thank you so much @peter-toth

@alamb
Copy link
Contributor

alamb commented Jun 21, 2024

My results appear consistent with yours @peter-toth -- looks like maybe the high column count case is slightly worse -- I can maybe profile it to see if I can find any reason that might be

++ critcmp main better-cse-identifier
group                                         better-cse-identifier                  main
-----                                         ---------------------                  ----
logical_aggregate_with_join                   1.01  1023.5±41.82µs        ? ?/sec    1.00  1008.6±10.43µs        ? ?/sec
logical_plan_tpcds_all                        1.00    152.4±1.06ms        ? ?/sec    1.00    152.2±1.00ms        ? ?/sec
logical_plan_tpch_all                         1.00     16.9±0.19ms        ? ?/sec    1.01     17.1±0.26ms        ? ?/sec
logical_select_all_from_1000                  1.02     18.9±0.12ms        ? ?/sec    1.00     18.6±1.74ms        ? ?/sec
logical_select_one_from_700                   1.01    829.8±9.09µs        ? ?/sec    1.00   823.9±10.68µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00   774.1±10.87µs        ? ?/sec    1.00   770.5±27.71µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00   765.0±32.06µs        ? ?/sec    1.00    763.9±9.85µs        ? ?/sec
physical_plan_tpcds_all                       1.00  1073.0±12.72ms        ? ?/sec    1.08   1159.6±5.63ms        ? ?/sec
physical_plan_tpch_all                        1.00     71.7±0.67ms        ? ?/sec    1.10     79.0±0.88ms        ? ?/sec
physical_plan_tpch_q1                         1.00      2.5±0.04ms        ? ?/sec    1.36      3.5±0.02ms        ? ?/sec
physical_plan_tpch_q10                        1.00      3.6±0.03ms        ? ?/sec    1.12      4.0±0.03ms        ? ?/sec
physical_plan_tpch_q11                        1.00      3.1±0.03ms        ? ?/sec    1.09      3.4±0.03ms        ? ?/sec
physical_plan_tpch_q12                        1.00      2.4±0.01ms        ? ?/sec    1.12      2.7±0.02ms        ? ?/sec
physical_plan_tpch_q13                        1.00  1805.2±95.39µs        ? ?/sec    1.11      2.0±0.01ms        ? ?/sec
physical_plan_tpch_q14                        1.00      2.0±0.03ms        ? ?/sec    1.18      2.4±0.02ms        ? ?/sec
physical_plan_tpch_q16                        1.00      3.0±0.03ms        ? ?/sec    1.11      3.3±0.03ms        ? ?/sec
physical_plan_tpch_q17                        1.00      2.8±0.02ms        ? ?/sec    1.09      3.0±0.03ms        ? ?/sec
physical_plan_tpch_q18                        1.00      3.2±0.03ms        ? ?/sec    1.09      3.5±0.02ms        ? ?/sec
physical_plan_tpch_q19                        1.00      4.9±0.06ms        ? ?/sec    1.05      5.2±0.05ms        ? ?/sec
physical_plan_tpch_q2                         1.00      6.3±0.04ms        ? ?/sec    1.05      6.6±0.05ms        ? ?/sec
physical_plan_tpch_q20                        1.00      3.7±0.04ms        ? ?/sec    1.07      3.9±0.04ms        ? ?/sec
physical_plan_tpch_q21                        1.00      5.1±0.06ms        ? ?/sec    1.06      5.4±0.03ms        ? ?/sec
physical_plan_tpch_q22                        1.00      2.7±0.02ms        ? ?/sec    1.10      3.0±0.03ms        ? ?/sec
physical_plan_tpch_q3                         1.00      2.6±0.03ms        ? ?/sec    1.16      2.9±0.02ms        ? ?/sec
physical_plan_tpch_q4                         1.00  1956.8±10.71µs        ? ?/sec    1.09      2.1±0.02ms        ? ?/sec
physical_plan_tpch_q5                         1.00      3.7±0.04ms        ? ?/sec    1.09      4.1±0.02ms        ? ?/sec
physical_plan_tpch_q6                         1.00   1316.0±9.67µs        ? ?/sec    1.10  1452.2±16.94µs        ? ?/sec
physical_plan_tpch_q7                         1.00      4.7±0.03ms        ? ?/sec    1.04      4.8±0.03ms        ? ?/sec
physical_plan_tpch_q8                         1.00      5.8±0.04ms        ? ?/sec    1.07      6.2±0.04ms        ? ?/sec
physical_plan_tpch_q9                         1.00      4.4±0.03ms        ? ?/sec    1.07      4.7±0.03ms        ? ?/sec
physical_select_all_from_1000                 1.04     46.8±0.32ms        ? ?/sec    1.00     45.1±0.20ms        ? ?/sec
physical_select_one_from_700                  1.03      3.5±0.02ms        ? ?/sec    1.00      3.4±0.02ms        ? ?/sec

@github-actions github-actions bot added core Core DataFusion crate substrait labels Jun 22, 2024
@peter-toth peter-toth force-pushed the better-cse-identifier branch from b23529a to a0913aa Compare June 22, 2024 09:14
@peter-toth
Copy link
Contributor Author

@alamb, can you please help me with that MSRV failure? I don't know what could be the source of the failure and if I try running cargo msrv verify locally it fails with:

% cargo msrv verify
Fetching index
Unable to find key 'package.rust-version' (or 'package.metadata.msrv') in '/Users/ptoth/git/apache/arrow-datafusion/Cargo.toml'

I've bumped MSRV to 1.76 in ccc92b9. It we want to avoid that then let me know and I will try to find out what is not available in 1.75.

@peter-toth peter-toth force-pushed the better-cse-identifier branch from d69098d to 1a6f8e7 Compare June 23, 2024 08:12
@alamb
Copy link
Contributor

alamb commented Jun 23, 2024

I ran the benchmarks again:

Looks to me like this PR makes planning 10% faster

physical_plan_tpcds_all                       1.00   1074.8±5.94ms        ? ?/sec    1.10   1181.1±6.54ms        ? ?/sec
physical_plan_tpch_all                        1.00     72.2±0.77ms        ? ?/sec    1.11     79.8±0.67ms        ? ?/sec

🚀

++ critcmp main better-cse-identifier
group                                         better-cse-identifier                  main
-----                                         ---------------------                  ----
logical_aggregate_with_join                   1.00  1038.1±44.37µs        ? ?/sec    1.00  1040.8±49.37µs        ? ?/sec
logical_plan_tpcds_all                        1.00    153.0±1.07ms        ? ?/sec    1.01    154.3±0.98ms        ? ?/sec
logical_plan_tpch_all                         1.00     17.1±0.20ms        ? ?/sec    1.01     17.2±0.20ms        ? ?/sec
logical_select_all_from_1000                  1.00     18.9±0.09ms        ? ?/sec    1.01     19.0±0.11ms        ? ?/sec
logical_select_one_from_700                   1.01   840.6±10.15µs        ? ?/sec    1.00    836.0±9.66µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00   789.5±33.87µs        ? ?/sec    1.00   791.5±15.53µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00   771.3±21.54µs        ? ?/sec    1.02   785.4±21.89µs        ? ?/sec
physical_plan_tpcds_all                       1.00   1074.8±5.94ms        ? ?/sec    1.10   1181.1±6.54ms        ? ?/sec
physical_plan_tpch_all                        1.00     72.2±0.77ms        ? ?/sec    1.11     79.8±0.67ms        ? ?/sec
physical_plan_tpch_q1                         1.00      2.6±0.02ms        ? ?/sec    1.38      3.5±0.03ms        ? ?/sec
physical_plan_tpch_q10                        1.00      3.6±0.03ms        ? ?/sec    1.14      4.1±0.06ms        ? ?/sec
physical_plan_tpch_q11                        1.00      3.1±0.03ms        ? ?/sec    1.11      3.4±0.04ms        ? ?/sec
physical_plan_tpch_q12                        1.00      2.4±0.02ms        ? ?/sec    1.13      2.8±0.02ms        ? ?/sec
physical_plan_tpch_q13                        1.00  1814.3±19.22µs        ? ?/sec    1.12      2.0±0.02ms        ? ?/sec
physical_plan_tpch_q14                        1.00      2.0±0.02ms        ? ?/sec    1.18      2.4±0.02ms        ? ?/sec
physical_plan_tpch_q16                        1.00      3.0±0.03ms        ? ?/sec    1.12      3.4±0.03ms        ? ?/sec
physical_plan_tpch_q17                        1.00      2.8±0.03ms        ? ?/sec    1.10      3.1±0.03ms        ? ?/sec
physical_plan_tpch_q18                        1.00      3.3±0.03ms        ? ?/sec    1.11      3.6±0.04ms        ? ?/sec
physical_plan_tpch_q19                        1.00      4.8±0.05ms        ? ?/sec    1.07      5.2±0.05ms        ? ?/sec
physical_plan_tpch_q2                         1.00      6.4±0.15ms        ? ?/sec    1.05      6.7±0.05ms        ? ?/sec
physical_plan_tpch_q20                        1.00      3.6±0.03ms        ? ?/sec    1.08      3.9±0.03ms        ? ?/sec
physical_plan_tpch_q21                        1.00      5.1±0.07ms        ? ?/sec    1.08      5.4±0.04ms        ? ?/sec
physical_plan_tpch_q22                        1.00      2.7±0.03ms        ? ?/sec    1.10      3.0±0.02ms        ? ?/sec
physical_plan_tpch_q3                         1.00      2.6±0.02ms        ? ?/sec    1.16      3.0±0.02ms        ? ?/sec
physical_plan_tpch_q4                         1.00  1974.3±23.01µs        ? ?/sec    1.09      2.1±0.03ms        ? ?/sec
physical_plan_tpch_q5                         1.00      3.7±0.04ms        ? ?/sec    1.10      4.1±0.06ms        ? ?/sec
physical_plan_tpch_q6                         1.00  1321.5±35.02µs        ? ?/sec    1.11  1472.4±20.45µs        ? ?/sec
physical_plan_tpch_q7                         1.00      4.7±0.04ms        ? ?/sec    1.05      4.9±0.05ms        ? ?/sec
physical_plan_tpch_q8                         1.00      5.9±0.08ms        ? ?/sec    1.09      6.4±0.06ms        ? ?/sec
physical_plan_tpch_q9                         1.00      4.4±0.05ms        ? ?/sec    1.08      4.8±0.04ms        ? ?/sec
physical_select_all_from_1000                 1.00     46.7±0.40ms        ? ?/sec    1.02     47.7±0.29ms        ? ?/sec
physical_select_one_from_700                  1.00      3.5±0.02ms        ? ?/sec    1.01      3.5±0.03ms        ? ?/sec

@alamb alamb changed the title Better CSE identifier Improve CommonSubexprEliminate identifier management (10% faster planning) Jun 23, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow @peter-toth - this is pretty amazing. Both in terms of code as well as ability to stick to it and keep it moving through many different iterations.

200w (2)

I had some minor comment suggestions, but really nothing of substance to add. I think once this PR's conflicts were resolved we could merge this PR.

cc @waynexia

@@ -26,7 +26,7 @@ homepage = { workspace = true }
repository = { workspace = true }
license = { workspace = true }
authors = { workspace = true }
rust-version = "1.75"
rust-version = "1.76"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By my reading of the MSRV policy

datafusion/README.md

Lines 100 to 103 in d8bcff5

DataFusion's Minimum Required Stable Rust Version (MSRV) policy is to support
each stable Rust version for 6 months after it is
[released](https://github.com/rust-lang/rust/blob/master/RELEASES.md). This
generally translates to support for the most recent 3 to 4 stable Rust versions.

1.75 was released 2023-12-28 meaning we need to keep the MSRV at 1.75 until 2024-06-28 (6 days from now)

However, since we aren't going to make a release until around July 11 #11077 this is probably ok

datafusion/expr/src/expr.rs Outdated Show resolved Hide resolved
datafusion/expr/src/expr.rs Outdated Show resolved Hide resolved
datafusion/optimizer/src/common_subexpr_eliminate.rs Outdated Show resolved Hide resolved
if group_found_common || aggr_found_common {
// rewrite both group exprs and aggr_expr
let rewritten = self.rewrite_expr(
vec![group_expr.clone(), aggr_expr.clone()],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this only happens when we are actually doing a CSE rewrite (and not on all plans) I think it is fine

if aggr_found_common {
let mut common_exprs = CommonExprs::new();
let mut rewritten_exprs = self.rewrite_exprs_list(
vec![new_aggr_expr.clone()],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can add a comment

.map(LogicalPlan::Projection)
.map(Transformed::yes)
} else {
// TODO: How exactly can the name or the schema change in this case?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something you plan to do in this PR? Or is it for follow up work (I can file a new ticket if it is follow on work)

Copy link
Contributor Author

@peter-toth peter-toth Jun 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I didn't want to remove this part in this PR, but I think we can get rid of it in a follow-up one. If you file a ticket then please ping me or assign it to me. I will be offline from tomorrow for a about a week, but when I'm back I'm happy to fix this.

@@ -524,41 +667,24 @@ impl CommonSubexprEliminate {
/// ```
///
/// where, it is referred once by each `WindowAggr` (total of 2) in the plan.
struct ConsecutiveWindowExprs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense -- I put it all in a struct initially to try and encapsulate the logic -- I think your changes look good to me

@@ -507,7 +642,7 @@ impl CommonSubexprEliminate {
/// ```
///
/// Returns:
/// * `window_exprs`: `[a, b, c, d]`
/// * `window_exprs`: `[[a, b, c], [d]]`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

# Conflicts:
#	datafusion/optimizer/src/common_subexpr_eliminate.rs
@alamb
Copy link
Contributor

alamb commented Jun 24, 2024

🚀 thanks again @peter-toth

@alamb alamb merged commit ede5598 into apache:main Jun 24, 2024
25 checks passed
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
…anning) (apache#10473)

* implement hash based CSE identifier

* move `is_volatile()` check out of visitor

* fix comments

* add transformed asserts when `rewrite_expr` is called

* update MSRV to 1.76

* cleanup

* better comments regarding volatile and short circuiting expressions

* address review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules substrait
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make CommonSubexprEliminate faster by stop copying so many strings
3 participants