Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(parquet): Add support for Page Indexes #223

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

zeroshade
Copy link
Member

Rationale for this change

Adds support for writing and retrieving Page Indexes from parquet files

What changes are included in this PR?

Added new writer properties to enable writing PageIndexes either as default or for specific columns.
Added RowGroupPageIndexReader, ColumnIndexReader, OffsetIndexReader to access page index information.

Are these changes tested?

Yes, extensive unit tests have been added including round trip tests and encryption/decryption tests.

Are there any user-facing changes?

no. Just new functions available.

closes #33

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sorry, I couldn't review this carefully... I took a look at all changes but I don't have enough knowledge of the Parquet specification and Go. And this change is too large for me... I hope that others can review this.)

@@ -63,23 +63,23 @@ type ColumnMetadata struct {
}

func (c *ColumnMetadata) findStrVal(key string) (string, bool) {
idx := c.Data.FindKey(CatalogNameKey)
idx := c.Data.FindKey(key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated fix, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, it was a metadata issue I came across that turned out to be small enough that it made sense to just incorporate

@zeroshade zeroshade requested a review from pitrou December 23, 2024 22:19
@zeroshade
Copy link
Member Author

Thanks for taking a look at all @kou!

@wgtmac and @pitrou I specifically tagged you both for review here given your Parquet experience hoping that one or both of you would be able to take a look at this from that perspective. Much of this is based on the way that Parquet C++ handles Page Indexes

@zeroshade
Copy link
Member Author

I'd like to get this in before I kick off the next release in the next week or so. Would anyone be free to do a review soon?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Go][Parquet] Does this implementation support page indexes?
2 participants