Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Go][Parquet] Does this implementation support page indexes? #33

Open
jhump opened this issue Jun 14, 2024 · 2 comments · May be fixed by #223
Open

[Go][Parquet] Does this implementation support page indexes? #33

jhump opened this issue Jun 14, 2024 · 2 comments · May be fixed by #223

Comments

@jhump
Copy link

jhump commented Jun 14, 2024

I would like to create page indexes (per this doc) for a column when writing a parquet file, and then use that index to seek to a particular row. I have a case where the file is sorted by a particular ID, and the queries often want to start with a particular ID and then read all rows thereafter. Without an index, I can find the right row group using statistics, but then have to scan through all values in the column in the row group to find the ID and determine the right row.

I've gone through all of the code and API in the parquet package and sub-packages, and all I can find are columns in Thrift-generated code for this and accessors in the column chunk metadata that return the file offset for where the index is stored. But there seems to be no API to actually read the index and use it. And there is no configuration, on the write side, for whether to create an index or not.

When will this be supported? How active is development on the Go runtime?

@zeroshade
Copy link
Member

Hey @jhump, you're correct that the page indexes are not yet supported in the Go implementation. It's been on my to do list for a while but I haven't found the bandwidth. I don't have a timeline for it, but I would also happily review a PR if you wanted to contribute it yourself!

Development on the Go implementation has been mostly maintenance lately, but I fully intend to get back to it and add page indexes, bloom filters, and so on. I just don't have a timeline yet

@jhump
Copy link
Author

jhump commented Jun 14, 2024

@zeroshade, thank you for the quick response! I don't have quite enough experience in this codebase to offer a PR yet, but maybe I will later, after using it more and doing more development with it.

@assignUser assignUser transferred this issue from apache/arrow Aug 30, 2024
@zeroshade zeroshade linked a pull request Dec 19, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants