Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Refactor Arrow Array and Schema allocation in ColumnReader and MetadataColumnReader #1047

Merged
merged 2 commits into from
Nov 2, 2024

Conversation

viirya
Copy link
Member

@viirya viirya commented Oct 31, 2024

Which issue does this PR close?

Closes #1048.

Rationale for this change

What changes are included in this PR?

How are these changes tested?

@viirya viirya changed the title chore: Refactor Arrow Array and Schema allocation in ColumnReader chore: Refactor Arrow Array and Schema allocation in ColumnReader and MetadataColumnReader Oct 31, 2024
@@ -52,7 +52,6 @@ const STR_CLASS_NAME: &str = "java/lang/String";
/// Parquet read context maintained across multiple JNI calls.
struct Context {
pub column_reader: ColumnReader,
pub arrays: Option<(Arc<FFI_ArrowArray>, Arc<FFI_ArrowSchema>)>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactoring simplifies the context and the logic. We don't need to keep the array and schema pointers in the producer side.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also easier to reason the array/schema release logic after this refactoring.

Copy link
Contributor

@kazuyukitanimura kazuyukitanimura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

cometVector, dictionary, importer.getProvider(), useDecimal128, false, isUuid);

currentVector =
new CometDictionaryVector(cometVector, dictionary, importer.getProvider(), useDecimal128);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated but we should not be needing to call CometDictionaryVector twice

native/core/src/parquet/mod.rs Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 25.71429% with 26 lines in your changes missing coverage. Please review.

Project coverage is 34.32%. Comparing base (3df9d5c) to head (7d7f388).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
...in/java/org/apache/comet/parquet/ColumnReader.java 0.00% 26 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1047      +/-   ##
============================================
+ Coverage     34.28%   34.32%   +0.03%     
- Complexity      881      887       +6     
============================================
  Files           112      113       +1     
  Lines         43478    43588     +110     
  Branches       9648     9647       -1     
============================================
+ Hits          14908    14963      +55     
- Misses        25481    25552      +71     
+ Partials       3089     3073      -16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@viirya viirya merged commit ac4223c into apache:main Nov 2, 2024
74 checks passed
@viirya
Copy link
Member Author

viirya commented Nov 2, 2024

Thanks @kazuyukitanimura

@viirya viirya deleted the refactor_array_schema_scan branch November 2, 2024 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor Arrow Array and Schema allocation in ColumnReader and MetadataColumnReader
3 participants