Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

block: reduce zstd compression allocations #4092

Open
jbowens opened this issue Oct 21, 2024 · 2 comments
Open

block: reduce zstd compression allocations #4092

jbowens opened this issue Oct 21, 2024 · 2 comments

Comments

@jbowens
Copy link
Collaborator

jbowens commented Oct 21, 2024

I haven't looked at the API yet, but it seems like we're holding ZSTD wrong. It's heap allocating several times every time we compress something. Presumably we can recycle these allocations through using the API correctly.

ROUTINE ======================== github.com/DataDog/zstd.NewWriterLevelDict in /Users/jackson/go/pkg/mod/github.com/!data!dog/[email protected]/zstd_stream.go
   3228945    3228945 (flat, cum) 43.07% of Total
         .          .    120:func NewWriterLevelDict(w io.Writer, level int, dict []byte) *Writer {
         .          .    121:   var err error
         .          .    122:   ctx := C.ZSTD_createCStream()
         .          .    123:
         .          .    124:   // Load dictionnary if any
         .          .    125:   if dict != nil {
         .          .    126:           err = getError(int(C.ZSTD_CCtx_loadDictionary(ctx,
         .          .    127:                   unsafe.Pointer(&dict[0]),
         .          .    128:                   C.size_t(len(dict)),
         .          .    129:           )))
         .          .    130:   }
         .          .    131:
         .          .    132:   if err == nil {
         .          .    133:           // Only set level if the ctx is not in error already
         .          .    134:           err = getError(int(C.ZSTD_CCtx_setParameter(ctx, C.ZSTD_c_compressionLevel, C.int(level))))
         .          .    135:   }
         .          .    136:
   1024124    1024124    137:   return &Writer{
         .          .    138:           CompressionLevel: level,
         .          .    139:           ctx:              ctx,
         .          .    140:           dict:             dict,
         .          .    141:           srcBuffer:        make([]byte, 0),
   1046993    1046993    142:           dstBuffer:        make([]byte, CompressBound(1024)),
         .          .    143:           firstError:       err,
         .          .    144:           underlyingWriter: w,
   1157828    1157828    145:           resultBuffer:     new(C.compressStream2_result),
         .          .    146:   }
         .          .    147:}

Jira issue: PEBBLE-282

@petermattis
Copy link
Collaborator

We should be using zstd.CompressLevel (https://github.com/DataDog/zstd/blob/1.x/zstd.go#L95) which offers a similar API to snappy.Encode.

@jbowens
Copy link
Collaborator Author

jbowens commented Nov 18, 2024

We should be using zstd.CompressLevel (https://github.com/DataDog/zstd/blob/1.x/zstd.go#L95) which offers a similar API to snappy.Encode.

We might be able to get better performance from using (and reusing) the zstd.Ctx manually.

https://github.com/DataDog/zstd/blob/091c219d0f2a0cbb662944b65ca0838920d2f857/zstd.h#L251-L254

Tangentially relatedly, we currently prefix zstd payloads with a varint-encoded length of the decompressed payload. I don't think this should be necessary, because zstd encodes a Frame_Content_Size to indicate the uncompressed content size. We could contribute upstream a func to extract the frame content size, allowing us to allocate an appropriately-sized buffer in the cache.

@nicktrav nicktrav self-assigned this Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

3 participants