-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error out waiting clients on close, handle too many connections #18
Conversation
Previous behavior was to hang forever on unexpected close or unhandled error response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks legit, nice work on the script to reproduce the issue
export const parseMessage = function (dataBuf: Buffer): Message | false { | ||
if (dataBuf.length < 24) { | ||
return false; | ||
} | ||
|
||
if (dataBuf.length === ERROR_TOO_MANY_OPEN_CONNECTIONS.length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any known way we would get a partial buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems unlikely. I don't think a 33 byte write that would be the only thing the server writes would get split.
Even assuming that happens ,the cases are:
- parseMessage gets a chunk of "ERROR Too many", incomplete message so waits for more data, and then another chunk of " open connections\r\n" = handles it normally
- we get a partial message and wait for more that never comes = command timeout is hit
- we get a partial message and the socket is closed = newly fixed closed handler is hit.
const parseMessage = function (dataBuf) { | ||
if (dataBuf.length < 24) { | ||
return false; | ||
} | ||
if (dataBuf.length === ERROR_TOO_MANY_OPEN_CONNECTIONS.length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, do we need to throw the error and churn stack? Seems like good way to start, just thinking out loud while sipping coffee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question. I initially wrote it to return ERROR_TOO_MANY_OPEN_CONNECTIONS and then checked for that value in the responseHandler, but it felt a little awkward and like introducing a (super minimal) cost to the happy path for something that should be very rare. The action initiated by the error is rather expensive anyway: fully destroy the socket, reject a bunch of promises, probably cause the error to be thrown by downstream code. I'm interested in your opinion, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems good to me, like the repro. Only had two questions, neither of which are at all blocking and more just clarifications :). Ship it!
One interesting thing is that if I change the
I would perhaps expect some race conditions and flipping between the two possibilities (since either the data or the close event could be delivered first), but AFAICT it consistently does one or the other based on retries. I noticed this when testing against the main app. This should still achieve the desired aim. I'm not confident that there aren't other ways that these clients can degrade or get stuck, especially in the thick of our app. |
Previous behavior was to hang forever on unexpected close or unhandled error response.
This explicitly handles "ERROR Too many open connections" when parsing data response, though the server also immediately closes the connection.
This also adds a catch-all error handler to socket close, that activates if there are outstanding errorCallbacks (which means there are also responseCallbacks) since this indicates that there are clients waiting on results. On socket close with outstanding callbacks there is no mechanism for retry or even failure, so they will stall forever. Unfortunately there is very little debugging info in this case, and we can't just dump the current state of the responseBuffer since it may contain arbitrary (sensitive) data.
Added lotsofconns.js to smoketest this behavior. With max conns on the server at 1024, the server starts rejecting requests at 1000 (index 998 + my one observer connection), so the test would stall forever at
after this change it proceeds like
and so on.
Testing isn't as thorough as would be desired, but given how the code is currently structured I'd need to add a heavier duty mocking library to override
net.connect
or implement a test memcached server for it to actually talk to (which would not be that difficult). That's probably the next step as we exhaustively test the client.