Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add nbt::from_slice and zero copy support for strings #63

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Freax13
Copy link
Contributor

@Freax13 Freax13 commented May 16, 2021

This pr adds nbt::from_slice to enable zero copy deserialization for strings. The implementation is inspired by serde_json which uses a trait to abstract over multiple sources.

All of the read_* functions in raw.rs got moved into a trait. This trait is implemented for std::io::Read and &[u8].

The read_bare_string function was modified to also take a scratch buffer and return a borrowed, copied or owned string. In a lot of cases this enables reading a string without allocating. This is useful beyond zero copy as it's also used for identifiers in structs. This optimization is not used for Value and Blob as they always allocate anyways.

Even though zero support is added Cow<'de, str> has to be used because modified utf-8 does always allow zero copy.

Example:

#[derive(Deserialize)]
#[serde(rename_all = "PascalCase")]
pub struct Section<'a> {
    #[serde(serialize_with = "nbt::i64_array", default)]
    pub block_states: Option<Vec<i64>>,
    #[serde(borrow)]
    pub palette: Option<Vec<Block<'a>>>,
    pub y: i8,
}

#[derive(Deserialize, Hash)]
#[serde(rename_all = "PascalCase")]
pub struct Block<'a> {
    #[serde(borrow)]
    pub name: Cow<'a, str>,
    #[serde(default, borrow)]
    pub properties: Option<BTreeMap<Cow<'a, str>, Cow<'a, str>>>,
}

I had some trouble getting reliable results from the benchmarks, but there usually were only small regressions of up to 5% and improvements of up to 10-20%. (The benchmarks were not modified, I suspect all the speed up was from prevent allocations for identifiers in structs). I got varying but mostly positive results in my personal project for parsing anvil chunks with zero copy.

@atheriel
Copy link
Collaborator

I'm not sure I'm 100% following the changes, but it looks like internally you use a Cursor to wrap the byte slice. Since Cursor implements io::Read, wouldn't it be possible to implement from_slice() by wrapping the byte slice in a Cursor at the top level and passing that to from_reader()? That could greatly simplify the changes needed. Or am I missing something?

@Freax13
Copy link
Contributor Author

Freax13 commented May 19, 2021

They are very similar but not quite the same: read_bare_string is different because it needs access to the slice to borrow from it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants