-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
D2 client side implementation of server-reported health #617
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress! Did a first pass review. I think we can discuss more on up/downStep model vs. P2C. Sticky routing is currently best effort. It's not guaranteed to reach the same host when the pointsMap changes anyways. I still think that 5s maybe too long for server reported load to be effective.
d2-schemas/src/main/pegasus/com/linkedin/d2/D2RelativeStrategyProperties.pdl
Outdated
Show resolved
Hide resolved
@@ -322,22 +327,38 @@ public void record() | |||
@Override | |||
public void endCall() | |||
{ | |||
endCall(false, null); | |||
endCall(Collections.emptyMap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to just pass the server reported score to endCall() instead of the entire wireAttributes map? It seems that only private void endCall(boolean hasError, ErrorType errorType, int serverLoadScore)
uses wireAttributes. Also it prevents creating an empty map for each endCall() invocation.
degrader/src/main/java/com/linkedin/util/degrader/CallTrackerImpl.java
Outdated
Show resolved
Hide resolved
d2/src/main/java/com/linkedin/d2/balancer/strategies/relative/StateUpdater.java
Outdated
Show resolved
Hide resolved
d2/src/main/java/com/linkedin/d2/balancer/strategies/relative/StateUpdater.java
Outdated
Show resolved
Hide resolved
|
||
/** | ||
* If the latest server reported load score is under this specified factor of the cluster average, | ||
* we will increase the health score by upStep. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can have more discussion on up/downStep model vs. P2C. Sticky routing is currently best effort. It's not guaranteed to reach the same host when the pointsMap changes. I still think that 5s maybe too long for server reported load to be effective.
degrader/src/main/java/com/linkedin/util/degrader/CallTracker.java
Outdated
Show resolved
Hide resolved
@@ -192,11 +192,11 @@ public void onResponse(TransportResponse<RestResponse> response) | |||
if (response.hasError()) | |||
{ | |||
Throwable throwable = response.getError(); | |||
handleError(_callCompletion, throwable); | |||
handleError(_callCompletion, throwable, response.getWireAttributes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to parse server load from wireAttributes when enableServerReportedLoad
is set to false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 3 places that we can potentially control the behavior on the client side:
- TrackerClient - whether to pass wire attributes
- CallTracker - whether to update the server-reported load in each interval
- load balancer strategy - whether to take the server-reported load into load balancing decision.
When I designed this I also debated where I should put the control, in the end I only put one control in load balancer strategy to make the control logic all in one place.
Now TrackerClient and CallTracker just update the server reported load whenever it's ready, and in the load balancing strategy there are 2 cases we need to watch out:
- Server enables reporting, but client did not enable, which we respect the client side config flag
- Server does not enable yet, but client enables, which means the reported load will always be -1. Either case, we control the RLB to not consider server reported load.
There is definitely a way to add control in TrackerClient, where we can always report -1 as the score. There will be more code we need to cleanup later if we add a control here. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just changed this logic in my latest commit. I guarded the code with the config on both TrackerClient and the hash ring selector logic.
Implemented the following components in the D2 client side: