Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No elegent way to stream hash given a hash code #141

Open
CBenoit opened this issue Sep 28, 2021 · 5 comments
Open

No elegent way to stream hash given a hash code #141

CBenoit opened this issue Sep 28, 2021 · 5 comments

Comments

@CBenoit
Copy link

CBenoit commented Sep 28, 2021

Hi,

In previous multihash version, we used to be able to compute the digest in a streamed manner using MultihashDigest::input and it was possible to get a boxed MultihashDigest given a multihash.
I currently see no way of doing the same, which is an issue in some use cases.

For example, I need to validate a digest computed from a file. Since the file can be big, I want to use the new StatefulHasher trait. However, I found no way to get a trait object.

Here’s my code:

pub fn validate_file_checksum(expected_digest: &str, file_path: &Path) -> std::io::Result<bool> {
    let (_, hash_data) = multibase::decode(expected_digest).map_err(|e| Error::new(ErrorKind::InvalidInput, e))?;
    let expected_digest = Multihash::from_bytes(&hash_data).map_err(|e| Error::new(ErrorKind::InvalidInput, e))?;
    let hash_code = multihash::Code::try_from(expected_digest.code()).map_err(|e| Error::new(ErrorKind::InvalidInput, e))?;

    // FIXME: multihash new API is breaking this code for streaming hashing (checked for version 0.14)
    //
    //const BUF_SIZE: usize = 1024 * 128;
    //let file = File::open(file_path)?;
    //let mut reader = BufReader::with_capacity(BUF_SIZE, file);
    //
    //let hasher = todo!("get an appropriate trait object hasher given the hash code");
    //
    //loop {
    //    let length = {
    //        let buffer = reader.fill_buf()?;
    //        hasher.update(buffer);
    //        buffer.len()
    //    };
    //    if length == 0 {
    //        break;
    //    }
    //    reader.consume(length);
    //}
    //
    //let digest_found = hasher.finalize();
    //
    // So instead, we read the whole file in memory:

    let file_content = std::fs::read_to_string(file_path)?;
    let digest_found = hash_code.digest(file_content.as_bytes());

    Ok(expected_digest == digest_found)
}

If I overlooked something, please let me know!

Thank you

@mriise
Copy link
Contributor

mriise commented Sep 28, 2021

It is a bit confusing as both Hasher and StatefulHasher implement Default, but you are explicit about what you want Rust will give it to you.

let hasher: StatefulHasher = Identity256::default();

hopefully this works for you :)

@CBenoit
Copy link
Author

CBenoit commented Sep 29, 2021

Hi 🙂

Thank you for the answer, but this is not what I’m looking for.
I need to get a hasher from a hash code I can’t know ahead of time (see my snippet above).
The issue is precisely that we can’t use StatefulHasher except when using a specific algorithm known at compile-time like you mentioned, which kind of defeat the purpose of multihash to some extend :/
The new API is very nice when using digest is acceptable though!

@vmx
Copy link
Member

vmx commented Sep 30, 2021

I had a look. I currently see no way of doing it with the current code. The way things currently work, you cannot return a StatefulHasher based on the Code, as the StatefulHashers depend on specific Digests (please correct me if I'm wrong).

I've one idea though. Lots of the code is generated. So perhaps we could generate a companion struct to the Code enum, which implements the StatefulHasher functionality for all the Codes. That struct would that returned by a Code::hasher() call. I'm not sure if that would work, but it might be worth a try.

@CBenoit
Copy link
Author

CBenoit commented Sep 30, 2021

I had a look. I currently see no way of doing it with the current code. The way things currently work, you cannot return a StatefulHasher based on the Code, as the StatefulHashers depend on specific Digests (please correct me if I'm wrong).

Exact!

I've one idea though. Lots of the code is generated. So perhaps we could generate a companion struct to the Code enum, which implements the StatefulHasher functionality for all the Codes. That struct would that returned by a Code::hasher() call. I'm not sure if that would work, but it might be worth a try.

This would be really helpful!
However, it might not be very straightforward because StatefulHasher has associated types and implementing structs are using different types (because different digest size).

@vmx
Copy link
Member

vmx commented Sep 30, 2021

(because different digest size).

When you derive a Mutlihash via #[derive(Multihash)], all digests should have the same size. So at least that part should work (others may not ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants