Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minor: Refactor binary expr serde to reduce code duplication #1053

Merged
merged 5 commits into from
Nov 5, 2024

Conversation

andygrove
Copy link
Member

Which issue does this PR close?

N/A

Rationale for this change

We currently have 20 duplicate structs defined in protobuf for binary expressions, such as:

message Equal {
  Expr left = 1;
  Expr right = 2;
}

message NotEqual {
  Expr left = 1;
  Expr right = 2;
}

message EqualNullSafe {
  Expr left = 1;
  Expr right = 2;
}

As a result, we have a lot of duplicate code in QueryPlanSerde for serializing binary expressions that cannot easily be refactored because each code block references a different (but identical) generated class.

What changes are included in this PR?

  • Use a single definition of BinaryExpr
  • Refactor QueryPlanSerde to remove a lot of duplicate boilerplate code

How are these changes tested?

Existing tests

Copy link
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a beautiful refactor!

@andygrove
Copy link
Member Author

I will follow up with similar PRs for UnaryExpr and MathExpr

Copy link
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -2388,7 +2388,7 @@ mod tests {
};

let expr = spark_expression::Expr {
expr_struct: Some(Eq(Box::new(spark_expression::Equal {
expr_struct: Some(Eq(Box::new(spark_expression::BinaryExpr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is the only place where the expression needed to be updated explicitly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Comment on lines +1090 to +1093
ExprOuterClass.Expr
.newBuilder()
.setEq(builder)
.build()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel these lines can be a part of createBinaryExpr so that we do not have to map()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block of code is specifically for creating BinaryExpr eq = 9 (note the call to setEq that uses the result of createBinaryExpr). Each of these 20 similar blocks of code is for setting a specific top-level expr_struct. It is only the contained BinaryExpr struct that is common to each one. Does that make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm .. perhaps I could refactor farther so that the map receives an ExprOuterClass.Expr.Builder and then only has to call setEq in the map block. That would be more concise. I will try that as a follow on PR after #1056 is merged

@andygrove
Copy link
Member Author

Thanks for the reviews @kazuyukitanimura @parthchandra @parthchandra

@andygrove andygrove merged commit 69760a3 into apache:main Nov 5, 2024
74 checks passed
@andygrove andygrove deleted the refactor-binary-expr branch November 5, 2024 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants