Prevent overwhelming of worker nodes by dynamically sizing thread pools #469

JamesRTaylor · 2020-10-09T16:15:52Z

Instead of sizing the local transfer thread pool and bookeeper thread pool at 4096, they should be sized dynamically based on the formula that @stagraqubole outlined here:

rubix.pool.size.max=P
number-of-nodes=N
max-threads=P*N
So in a 100 node cluster, with rubix.pool.size.max=4, you can keep lower this value to 400.

You could introduce a config instead that expresses a percentage increase/decrease from this dynamically calculated size.

Having two thread pools of 4096 threads on top of the work already being done by a worker node leads worker nodes becoming unresponsive.

sopel39 · 2020-10-09T17:24:19Z

Should default be lower than 4096 then, like 512? @stagraqubole ?

JamesRTaylor · 2020-10-09T18:16:08Z

The correct sizing of the pool is really related to the number of worker nodes. Sizing too small causes many more queries to timeout while sizing too large can cause the node to become unresponsive. With @stagraqubole help, we tuned out cluster of 110 worker nodes with the following config values to find the right balance and solve this issue:

    rubix.pool.size.max=8
    rubix.local.transfer.max-threads=1200
    rubix.cache.bookkeeper.max-threads=1200
    rubix.pool.wait.timeout=200

This took a lot of trial and error, though. To improve the out-of-the-box experience, it'd be good if the thread pool sizes were dynamically determined with a config value expressed not as an absolute size, but as a percentage above/below the calculated size.

JamesRTaylor changed the title ~~Size the local transfer thread pool and bookeeper thread pool dynamically~~ Prevent overwhelming of worker nodes by dynamically sizing thread pools Oct 9, 2020

JamesRTaylor mentioned this issue Oct 9, 2020

Avoid caching presto worker nodes #465

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent overwhelming of worker nodes by dynamically sizing thread pools #469

Prevent overwhelming of worker nodes by dynamically sizing thread pools #469

JamesRTaylor commented Oct 9, 2020

sopel39 commented Oct 9, 2020

JamesRTaylor commented Oct 9, 2020 •

edited

Loading

Prevent overwhelming of worker nodes by dynamically sizing thread pools #469

Prevent overwhelming of worker nodes by dynamically sizing thread pools #469

Comments

JamesRTaylor commented Oct 9, 2020

sopel39 commented Oct 9, 2020

JamesRTaylor commented Oct 9, 2020 • edited Loading

JamesRTaylor commented Oct 9, 2020 •

edited

Loading