-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop copying Expr
s and LogicalPlans so much during Common Subexpression Elimination
#9873
Comments
I am on vacation this week (well, partial vacation as it is school break week) so I likely won't have time to work on #10067 until next week. If you have time to do so that would be great. Otherwise I will pick it up the following week CSE is one of the passes that shows up in the planning benchmarks when I last profiled it, so I think this particular change will be very impactful to planning performance |
(Thank you @peter-toth 🙏 -- it is much fun working with you) |
Ok, got it. I'm happy to look into it, but not sure I can do it before Monday. Enjoy your partial vacation! 😄 |
I've started working on this, but it will surely take some time... |
Expr
s so much during Common Subexpression EliminationExpr
s and LogicalPlans so much during Common Subexpression Elimination
@alamb, just a quick update that I'm still working on this. Trying out different ideas... Will try to open a PR this week or next. |
Hey I am interested in helping with this. Maybe @peter-toth and I can divide our efforts here? Let me know what you've worked on so far, and I can figure out how to help. One thing I see in particular that's not directly cloning the Assuming that the new zero-copy implementation will continue using the |
Actually it looks like I've started a branch for my work. Here is a basic commit that simply swaps the existing code over to using the |
I've just opened a PR #10396. I wanted to continue @alamb's #10067 but then I ran into a few different issues with the rule so I think we should address those first. 2 notes:
|
I am happy to do this. Thank you @peter-toth |
UPDATE: It looks like |
I believe CSE may predate the Hash impl for Expr |
I forgot to comment on this thread that my detailed answer is on the other: #10426 (comment) |
@alamb, I think you shouldn't wait for my #10473, as there is still some preliminary work required I need to finish (#10543) and I'm still not sure about a few details of #10473. |
Thanks @peter-toth |
Is your feature request related to a problem or challenge?
The common subexpression elimination pass copies many
Expr
s around. You can see this performance impact this has by looking at the screenshot from #9637 (comment)While we will fix the copying plan problem in #9637 I think there is more work to be done in the common sub expression code itself, which copies a significant number of Exprs and Strings around
Describe the solution you'd like
Figure out how to avoid
clone
ingExpr
s in the https://github.com/apache/arrow-datafusion/blob/main/datafusion/optimizer/src/common_subexpr_eliminate.rsExpr
s themselvesIdentifier
We should see a significant improvement in the sql_planner benchmarks:
Describe alternatives you've considered
No response
Additional context
I noticed this while reviewing #9871
The text was updated successfully, but these errors were encountered: