Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Parquet] Automatically split batches to prevent overflow #211

Open
zeroshade opened this issue Dec 9, 2024 · 0 comments
Open

[Parquet] Automatically split batches to prevent overflow #211

zeroshade opened this issue Dec 9, 2024 · 0 comments
Labels
Type: enhancement New feature or request

Comments

@zeroshade
Copy link
Member

Describe the enhancement requested

#197 added an option to force columns to use the Large variants of String and Binary to handle columns with data larger than 2GB. We should take this one step further to automatically reduce the batch size of records to fit within int32 offsets if the column data is too large to fit within a single String/Binary column (data is more than 2GB).

Component(s)

Parquet

@zeroshade zeroshade added the Type: enhancement New feature or request label Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant