-
Notifications
You must be signed in to change notification settings - Fork 817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet Modular Encryption support #6637
base: main
Are you sure you want to change the base?
Conversation
Currently this is a rough rebase of work done by @ggershinsky. As |
@rok let me know if you want any help shoehorning this into |
Is there any help, input or contribution needed here? |
Thanks for the offer @etseidl & @brainslush! I'm making some progress and would definitely appreciate a review! I'll ping once I push. |
7faac72
to
6f055f9
Compare
fe488b3
to
d263510
Compare
@etseidl could you please do a quick pass to say if this makes sense in respect to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only looking at the metadata bits for now...looks good to me so far. Just a few minor nits. Thanks @rok!
@@ -52,13 +53,16 @@ pub fn parse_metadata<R: ChunkReader>(chunk_reader: &R) -> Result<ParquetMetaDat | |||
/// Decodes [`ParquetMetaData`] from the provided bytes. | |||
/// | |||
/// Typically this is used to decode the metadata from the end of a parquet | |||
/// file. The format of `buf` is the Thift compact binary protocol, as specified | |||
/// file. The format of `buf` is the Thrift compact binary protocol, as specified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
/// by the [Parquet Spec]. | ||
/// | ||
/// [Parquet Spec]: https://github.com/apache/parquet-format#metadata | ||
#[deprecated(since = "53.1.0", note = "Use ParquetMetaDataReader::decode_metadata")] | ||
pub fn decode_metadata(buf: &[u8]) -> Result<ParquetMetaData> { | ||
ParquetMetaDataReader::decode_metadata(buf) | ||
pub fn decode_metadata( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should be updating a deprecated function. If encryption is desired I'd say force use of the new API so we don't have to maintain this one. Just pass None
to ParquetMetaDataReader::decode_metadata
.
parquet/src/file/metadata/reader.rs
Outdated
&mut fetch, | ||
file_size, | ||
self.get_prefetch_size(), | ||
self.file_decryption_properties.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor nit: I understand that file_decryption_properties
needs to be cloned eventually...just wondering if we could pass references down into decode_metadata
and do the clone there where it's more obviously needed.
Which issue does this PR close?
This PR is based on branch and an internal patch and aims to provide basic modular encryption support. Closes #3511.
Rationale for this change
See #3511.
What changes are included in this PR?
TBD
Are there any user-facing changes?
TBD