-
Notifications
You must be signed in to change notification settings - Fork 1
opal_progress guidelines
Per discussion at 2018 OMPI dev meeting, we agreed to move forward to make opal_progress()
multithreaded. This is the guideline to make your component in compliance with this change.
OMPI will not serialize calls to opal_progress()
anymore. This means your component's progress function might get simultaneously invoked from multiple threads. This change allows a communication component to be more efficient in multithreaded scenario if they choose to (ie, parallelize your component by creating multiple working lanes).
At this stage, we still serialize opal_progress()
to give grace period for the components to adjust. We will be informing you via mailing list for the deadline that this change will take effect.
Not all OMPI MCA components will be affected by this change. Basically all communication components should be affected as well as any component that has registered a progress function. If your component works with timed triggers you might also be careful as the completion event might now be called from another thread, with opportunities for race conditions.
This depends on how your component works. Let's break it down.
- If your component is thread-safe.
- If you want to enhance your multithreaded performance, YES, take a look at btl/uct or btl/ofi to really take advantage of this change.
- If you just want to get by.
- If calling progress from multiple threads will not affect performance, NO.
- If calling progress from multiple threads will affect performance, Yes, goto HOW.
- If your component is not thread-safe.
- Maybe it is time to make it so?
- If you want your component to be compliance, Yes, goto HOW.
- If not, your component will have a hard time running in multithreaded mode. You might not care but someone will. Your component might create problem for others.
Again, this is the minimum requirement from the component.
- create a component mutex.
- Throw a TRYLOCK around your original
component_progress()
. DO NOT USE LOCK.
typedef struct yourcomponent_t {
...
...
...
opal_mutex_t component_lock; /* Add a new mutex_t here and initialize it at init. */
} yourcomponent_t;
void yourcomponent_component_progress(void)
{
if (!OPAL_THREAD_TRYLOCK(&component->component_lock)) {
/* YOUR ORIGINAL PROGRESS ROUTINES HERE*/
/* YOUR ORIGINAL PROGRESS ROUTINES HERE*/
/* YOUR ORIGINAL PROGRESS ROUTINES HERE*/
OPAL_THREAD_UNLOCK(&component->component_lock);
}
}
Don't forget to initialize/cleanup the mutex!
Please refrain from blocking/waiting for the mutex in component_progress()
as this will potentially stall every other component progression/behave as a serializer. You should return immediately if you cannot secure the mutex (use TRYLOCK
as the example suggest). The component that does not comply with this will get called out and have to do trial by combat at the next meeting. Note that this is just a general suggestion. It is a bad practice and has nothing to do with this change.