Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error out waiting clients on close, handle too many connections #18

Merged
merged 2 commits into from
Oct 3, 2023

Conversation

schleyfox
Copy link

Previous behavior was to hang forever on unexpected close or unhandled error response.

This explicitly handles "ERROR Too many open connections" when parsing data response, though the server also immediately closes the connection.

This also adds a catch-all error handler to socket close, that activates if there are outstanding errorCallbacks (which means there are also responseCallbacks) since this indicates that there are clients waiting on results. On socket close with outstanding callbacks there is no mechanism for retry or even failure, so they will stall forever. Unfortunately there is very little debugging info in this case, and we can't just dump the current state of the responseBuffer since it may contain arbitrary (sensitive) data.

Added lotsofconns.js to smoketest this behavior. With max conns on the server at 1024, the server starts rejecting requests at 1000 (index 998 + my one observer connection), so the test would stall forever at

connected 997
997
created 998
connected 998

after this change it proceeds like

created 997
connected 997
997
created 998
connected 998
MemJS: Server <localhost:11211> failed after (1) retries with error - ERROR Too many open connections
error Error: ERROR Too many open connections
    at Object.parseMessage (/Users/ben/projects/memjs/lib/memjs/utils.js:124:19)
    at Server.responseHandler (/Users/ben/projects/memjs/lib/memjs/server.js:101:32)
    at Socket.<anonymous> (/Users/ben/projects/memjs/lib/memjs/server.js:154:26)
    at Socket.emit (node:events:514:28)
    at addChunk (node:internal/streams/readable:324:12)
    at readableAddChunk (node:internal/streams/readable:297:9)
    at Readable.push (node:internal/streams/readable:234:10)
    at TCP.onStreamRead (node:internal/stream_base_commons:190:23) 998
created 999
closed 998 false
connected 999
MemJS: Server <localhost:11211> failed after (1) retries with error - ERROR Too many open connections
error Error: ERROR Too many open connections
    at Object.parseMessage (/Users/ben/projects/memjs/lib/memjs/utils.js:124:19)
    at Server.responseHandler (/Users/ben/projects/memjs/lib/memjs/server.js:101:32)
    at Socket.<anonymous> (/Users/ben/projects/memjs/lib/memjs/server.js:154:26)
    at Socket.emit (node:events:514:28)
    at addChunk (node:internal/streams/readable:324:12)
    at readableAddChunk (node:internal/streams/readable:297:9)
    at Readable.push (node:internal/streams/readable:234:10)
    at TCP.onStreamRead (node:internal/stream_base_commons:190:23) 999

and so on.

Testing isn't as thorough as would be desired, but given how the code is currently structured I'd need to add a heavier duty mocking library to override net.connect or implement a test memcached server for it to actually talk to (which would not be that difficult). That's probably the next step as we exhaustively test the client.

Previous behavior was to hang forever on unexpected close or unhandled
error response.
Copy link

@garrettheel garrettheel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks legit, nice work on the script to reproduce the issue

export const parseMessage = function (dataBuf: Buffer): Message | false {
if (dataBuf.length < 24) {
return false;
}

if (dataBuf.length === ERROR_TOO_MANY_OPEN_CONNECTIONS.length) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any known way we would get a partial buffer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems unlikely. I don't think a 33 byte write that would be the only thing the server writes would get split.

Even assuming that happens ,the cases are:

  1. parseMessage gets a chunk of "ERROR Too many", incomplete message so waits for more data, and then another chunk of " open connections\r\n" = handles it normally
  2. we get a partial message and wait for more that never comes = command timeout is hit
  3. we get a partial message and the socket is closed = newly fixed closed handler is hit.

const parseMessage = function (dataBuf) {
if (dataBuf.length < 24) {
return false;
}
if (dataBuf.length === ERROR_TOO_MANY_OPEN_CONNECTIONS.length) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, do we need to throw the error and churn stack? Seems like good way to start, just thinking out loud while sipping coffee

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question. I initially wrote it to return ERROR_TOO_MANY_OPEN_CONNECTIONS and then checked for that value in the responseHandler, but it felt a little awkward and like introducing a (super minimal) cost to the happy path for something that should be very rare. The action initiated by the error is rather expensive anyway: fully destroy the socket, reject a bunch of promises, probably cause the error to be thrown by downstream code. I'm interested in your opinion, though.

Copy link

@nathannotion nathannotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems good to me, like the repro. Only had two questions, neither of which are at all blocking and more just clarifications :). Ship it!

@schleyfox
Copy link
Author

One interesting thing is that if I change the loadsofconnstest.js to enable in-client retries (which default to 2, one try and one retry), the error changes from Too many connections to unexpected socket close, consistently.

created 1024
connected 1024
closed 1024 false
MemJS: Server <localhost:11211> failed after (2) retries with error - socket closed unexpectedly.
error Error: socket closed unexpectedly.
    at Socket.<anonymous> (/Users/ben/projects/memjs/lib/memjs/server.js:171:32)
    at Socket.emit (node:events:514:28)
    at TCP.<anonymous> (node:net:323:12) 1024

I would perhaps expect some race conditions and flipping between the two possibilities (since either the data or the close event could be delivered first), but AFAICT it consistently does one or the other based on retries.

I noticed this when testing against the main app. This should still achieve the desired aim.

I'm not confident that there aren't other ways that these clients can degrade or get stuck, especially in the thick of our app.

@schleyfox schleyfox merged commit 5da51e6 into master Oct 3, 2023
1 check passed
@schleyfox schleyfox deleted the benjamin--handle-close-and-too-many-conns branch October 3, 2023 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants