"Application tail latency is critical for services to meet their latency expectations. We have shown that the thread-per-core approach can reduce application tail latency of a key-value store by up to 71% compared to baseline Memcached running on commodity hardware and Linux."1
This library is mainly made for io-uring
and monoio. There are no dependency
on the runtime, so you should be able to use it with other runtime and also
without io-uring
.
The purpose of this library is to have a performant way to send data between
thread when threads are following a thread per core
architecture. Even if the
aim is to be performant remember it's a core to core passing, (or thread to
thread), which is really slow.
Thanks to Glommio for the inspiration.
Originally, the library was made when you had multiple thread listening to the
same TcpStream
and depending on what is sent through the TcpStream
you might
want to change the thread handling the stream.
You can check some examples in the tests.
Those benchmarks are only indicative, they are running in GA. You should run your own on the targeted hardware.
It shows that sharded-thread
based on utility.sharded_queue
is faster (~6%) than
if we built the mesh based on flume
.
- Glommio example on their sharding
- The original monoio issue
- Sharded Queue - the fastest concurrent collection
Licensed under either of
- Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT) at your option.