Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let FieldDef lazy-calculate and cache hash code #892

Merged
merged 3 commits into from
Nov 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ and what APIs have changed, if applicable.

## [Unreleased]

## [29.46.9] - 2023-11-02
- Update FieldDef so that it will lazily cache the hashCode.

## [29.46.8] - 2023-10-11
- add metrics about xds connection status and count

Expand Down Expand Up @@ -5554,7 +5557,8 @@ patch operations can re-use these classes for generating patch messages.

## [0.14.1]

[Unreleased]: https://github.com/linkedin/rest.li/compare/v29.46.8...master
[Unreleased]: https://github.com/linkedin/rest.li/compare/v29.46.9...master
[29.46.9]: https://github.com/linkedin/rest.li/compare/v29.46.8...v29.46.9
[29.46.8]: https://github.com/linkedin/rest.li/compare/v29.46.7...v29.46.8
[29.46.7]: https://github.com/linkedin/rest.li/compare/v29.46.6...v29.46.7
[29.46.6]: https://github.com/linkedin/rest.li/compare/v29.46.5...v29.46.6
Expand Down
11 changes: 11 additions & 0 deletions data/src/main/java/com/linkedin/data/template/FieldDef.java
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ public class FieldDef<T>
private final DataSchema _dataSchema;
private final Class<?> _dataClass;
private final RecordDataSchema.Field _field;
private Integer _hashCode;

public FieldDef(String name, Class<T> type)
{
Expand Down Expand Up @@ -126,6 +127,16 @@ public boolean equals(Object object)

@Override
public int hashCode()
{
if (_hashCode == null) {
// If this method is called by multiple thread, there might be multiple concurrent write
// here, but since the hashCode should be the same it is tolerable
_hashCode = computeHashCode();
}
return _hashCode;
Comment on lines +131 to +136
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving this comment here for my future self and/or other maintainers:

We compute the hash code of DataMap lazily using synchronize(this) and a form of double-checked locking:

public int dataComplexHashCode()
{
  if (_dataComplexHashCode != 0)
  {
    return _dataComplexHashCode;
  }

  synchronized (this)
  {
    if (_dataComplexHashCode == 0)
    {
      _dataComplexHashCode = DataComplexHashCode.nextHashCode();
    }
  }

  return _dataComplexHashCode;
}

This approach avoids auto boxing/unboxing between Integer and int but makes the assumption that the computed hash code will never be 0. Doing something like this is also an option we can take in the future if we want to explore this more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar but more general idea maybe just use another boolean to track whether the hashcode is computed. And better maybe using Atomics instead of synchronized block, like

public int hashCode() {
  if (_computed.get() == false) {
    int hashCode = computeHashCode();
    if (_computed.get() == false) {
      _hashCode.compareAndSwap(INIT_VALUE, hashCode);
      _computed.set(true);
    }
  }
  return _hashCode.get();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW here there's little guarantee the write will ever be visible to other threads. Have you considered making _hashCode at least volatile:

private volatile Integer _hashCode;

It preserves the original desired behavior of not wanting a full lock but at least makes it memory-safe.

Alternatively, why not just compute it at construction and make it final? Presumably instances of FieldDef aren't dynamically generated, they come from parsed .pdl and .pdsc schemas right? You could just eat the cost exactly once and forget about it, especially since it will happen very early on and likely completely out of the query serving codepath

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @PapaCharlie , thanks for chiming in:
For the flushing to memory problem: it's valid but since we'll not set it based on it's previous value, essentially all threads will set it to the same value if they cannot see it. So eventually that's not a safety issue but rather a performance issue (we compute it # of thread times rather than once). I like to compute it in ctor better (and I actually first implemented it in this way) but @zackthehuman is worrying about impacting the performance of the ctor so we changed it to lazy later on.

}

private int computeHashCode()
{
return 13*_name.hashCode() + 17*_type.hashCode() + 23*(_dataSchema == null? 1 :_dataSchema.hashCode());
}
Expand Down
2 changes: 1 addition & 1 deletion gradle.properties
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
version=29.46.8
version=29.46.9
group=com.linkedin.pegasus
org.gradle.configureondemand=true
org.gradle.parallel=true
Expand Down
Loading