-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let FieldDef lazy-calculate and cache hash code #892
Conversation
Thanks for the contribution, Patrick! Based on the usage of Here are a few things to consider:
|
Hi Zack, thank you for the comment:
Regarding the potential cost to the constructor, it can be solved by lazily initialize the Let me know what you think, thanks! |
I chatted with some folks about this internally and our recommendation is to compute the value lazily so that it is computed the first time How does this sound? |
Sure, I just changed it to compute lazily |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good to me. We should look out for any performance regressions due to auto-boxing/unboxing from Integer
to int
. If that happens, we can revisit this.
if (_hashCode == null) { | ||
// If this method is called by multiple thread, there might be multiple concurrent write | ||
// here, but since the hashCode should be the same it is tolerable | ||
_hashCode = computeHashCode(); | ||
} | ||
return _hashCode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving this comment here for my future self and/or other maintainers:
We compute the hash code of DataMap
lazily using synchronize(this)
and a form of double-checked locking:
public int dataComplexHashCode()
{
if (_dataComplexHashCode != 0)
{
return _dataComplexHashCode;
}
synchronized (this)
{
if (_dataComplexHashCode == 0)
{
_dataComplexHashCode = DataComplexHashCode.nextHashCode();
}
}
return _dataComplexHashCode;
}
This approach avoids auto boxing/unboxing between Integer
and int
but makes the assumption that the computed hash code will never be 0. Doing something like this is also an option we can take in the future if we want to explore this more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A similar but more general idea maybe just use another boolean to track whether the hashcode is computed. And better maybe using Atomics instead of synchronized block, like
public int hashCode() {
if (_computed.get() == false) {
int hashCode = computeHashCode();
if (_computed.get() == false) {
_hashCode.compareAndSwap(INIT_VALUE, hashCode);
_computed.set(true);
}
}
return _hashCode.get();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW here there's little guarantee the write will ever be visible to other threads. Have you considered making _hashCode
at least volatile
:
private volatile Integer _hashCode;
It preserves the original desired behavior of not wanting a full lock but at least makes it memory-safe.
Alternatively, why not just compute it at construction and make it final? Presumably instances of FieldDef
aren't dynamically generated, they come from parsed .pdl and .pdsc schemas right? You could just eat the cost exactly once and forget about it, especially since it will happen very early on and likely completely out of the query serving codepath
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @PapaCharlie , thanks for chiming in:
For the flushing to memory problem: it's valid but since we'll not set it based on it's previous value, essentially all threads will set it to the same value if they cannot see it. So eventually that's not a safety issue but rather a performance issue (we compute it # of thread times rather than once). I like to compute it in ctor better (and I actually first implemented it in this way) but @zackthehuman is worrying about impacting the performance of the ctor so we changed it to lazy later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the long wait! Let's goooooo!
Sometimes _dataSchema could be very complex and makes computing hash code for this class extremely slow. Since all data fields participated in hash code calculation is final, we should able to calculate it at construction time and avoid paying the cost repeatedly. * Update FieldDef lazily calculate and cache hash code --------- Co-authored-by: Patrick Zhai <[email protected]>
Sometimes
_dataSchema
could be very complex and makes computing hash code for this class extremely slow. Since all data fields participated in hash code calculation is final, we should able to calculate it at construction time and avoid paying the cost repeatedly.