Error out waiting clients on close, handle too many connections #18

schleyfox · 2023-10-03T13:26:19Z

Previous behavior was to hang forever on unexpected close or unhandled error response.

This explicitly handles "ERROR Too many open connections" when parsing data response, though the server also immediately closes the connection.

This also adds a catch-all error handler to socket close, that activates if there are outstanding errorCallbacks (which means there are also responseCallbacks) since this indicates that there are clients waiting on results. On socket close with outstanding callbacks there is no mechanism for retry or even failure, so they will stall forever. Unfortunately there is very little debugging info in this case, and we can't just dump the current state of the responseBuffer since it may contain arbitrary (sensitive) data.

Added lotsofconns.js to smoketest this behavior. With max conns on the server at 1024, the server starts rejecting requests at 1000 (index 998 + my one observer connection), so the test would stall forever at

connected 997
997
created 998
connected 998

after this change it proceeds like

created 997
connected 997
997
created 998
connected 998
MemJS: Server <localhost:11211> failed after (1) retries with error - ERROR Too many open connections
error Error: ERROR Too many open connections
    at Object.parseMessage (/Users/ben/projects/memjs/lib/memjs/utils.js:124:19)
    at Server.responseHandler (/Users/ben/projects/memjs/lib/memjs/server.js:101:32)
    at Socket.<anonymous> (/Users/ben/projects/memjs/lib/memjs/server.js:154:26)
    at Socket.emit (node:events:514:28)
    at addChunk (node:internal/streams/readable:324:12)
    at readableAddChunk (node:internal/streams/readable:297:9)
    at Readable.push (node:internal/streams/readable:234:10)
    at TCP.onStreamRead (node:internal/stream_base_commons:190:23) 998
created 999
closed 998 false
connected 999
MemJS: Server <localhost:11211> failed after (1) retries with error - ERROR Too many open connections
error Error: ERROR Too many open connections
    at Object.parseMessage (/Users/ben/projects/memjs/lib/memjs/utils.js:124:19)
    at Server.responseHandler (/Users/ben/projects/memjs/lib/memjs/server.js:101:32)
    at Socket.<anonymous> (/Users/ben/projects/memjs/lib/memjs/server.js:154:26)
    at Socket.emit (node:events:514:28)
    at addChunk (node:internal/streams/readable:324:12)
    at readableAddChunk (node:internal/streams/readable:297:9)
    at Readable.push (node:internal/streams/readable:234:10)
    at TCP.onStreamRead (node:internal/stream_base_commons:190:23) 999

and so on.

Testing isn't as thorough as would be desired, but given how the code is currently structured I'd need to add a heavier duty mocking library to override net.connect or implement a test memcached server for it to actually talk to (which would not be that difficult). That's probably the next step as we exhaustively test the client.

Previous behavior was to hang forever on unexpected close or unhandled error response.

src/memjs/utils.ts

garrettheel

Looks legit, nice work on the script to reproduce the issue

nathannotion · 2023-10-03T15:48:49Z

src/memjs/utils.ts

 export const parseMessage = function (dataBuf: Buffer): Message | false {
  if (dataBuf.length < 24) {
    return false;
  }
+
+  if (dataBuf.length === ERROR_TOO_MANY_OPEN_CONNECTIONS.length) {


is there any known way we would get a partial buffer?

it seems unlikely. I don't think a 33 byte write that would be the only thing the server writes would get split.

Even assuming that happens ,the cases are:

parseMessage gets a chunk of "ERROR Too many", incomplete message so waits for more data, and then another chunk of " open connections\r\n" = handles it normally

we get a partial message and wait for more that never comes = command timeout is hit

we get a partial message and the socket is closed = newly fixed closed handler is hit.

nathannotion · 2023-10-03T15:50:25Z

lib/memjs/utils.js

 const parseMessage = function (dataBuf) {
    if (dataBuf.length < 24) {
        return false;
    }
+    if (dataBuf.length === ERROR_TOO_MANY_OPEN_CONNECTIONS.length) {


hmmm, do we need to throw the error and churn stack? Seems like good way to start, just thinking out loud while sipping coffee

good question. I initially wrote it to return ERROR_TOO_MANY_OPEN_CONNECTIONS and then checked for that value in the responseHandler, but it felt a little awkward and like introducing a (super minimal) cost to the happy path for something that should be very rare. The action initiated by the error is rather expensive anyway: fully destroy the socket, reject a bunch of promises, probably cause the error to be thrown by downstream code. I'm interested in your opinion, though.

nathannotion

seems good to me, like the repro. Only had two questions, neither of which are at all blocking and more just clarifications :). Ship it!

schleyfox · 2023-10-03T20:17:05Z

One interesting thing is that if I change the loadsofconnstest.js to enable in-client retries (which default to 2, one try and one retry), the error changes from Too many connections to unexpected socket close, consistently.

created 1024
connected 1024
closed 1024 false
MemJS: Server <localhost:11211> failed after (2) retries with error - socket closed unexpectedly.
error Error: socket closed unexpectedly.
    at Socket.<anonymous> (/Users/ben/projects/memjs/lib/memjs/server.js:171:32)
    at Socket.emit (node:events:514:28)
    at TCP.<anonymous> (node:net:323:12) 1024

I would perhaps expect some race conditions and flipping between the two possibilities (since either the data or the close event could be delivered first), but AFAICT it consistently does one or the other based on retries.

I noticed this when testing against the main app. This should still achieve the desired aim.

I'm not confident that there aren't other ways that these clients can degrade or get stuck, especially in the thick of our app.

Error out waiting clients on close, handle too many connections

bade443

Previous behavior was to hang forever on unexpected close or unhandled error response.

schleyfox requested review from leon-root and nathannotion October 3, 2023 13:26

garrettheel reviewed Oct 3, 2023

View reviewed changes

src/memjs/utils.ts Show resolved Hide resolved

garrettheel approved these changes Oct 3, 2023

View reviewed changes

nathannotion reviewed Oct 3, 2023

View reviewed changes

nathannotion approved these changes Oct 3, 2023

View reviewed changes

Comment error string

cb1693c

schleyfox merged commit 5da51e6 into master Oct 3, 2023
1 check passed

schleyfox deleted the benjamin--handle-close-and-too-many-conns branch October 3, 2023 20:25

schleyfox mentioned this pull request Oct 3, 2023

Reset responseBuffer on error, close, and new connections #19

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error out waiting clients on close, handle too many connections #18

Error out waiting clients on close, handle too many connections #18

schleyfox commented Oct 3, 2023

garrettheel left a comment

nathannotion Oct 3, 2023

schleyfox Oct 3, 2023

nathannotion Oct 3, 2023

schleyfox Oct 3, 2023

nathannotion left a comment

schleyfox commented Oct 3, 2023

Error out waiting clients on close, handle too many connections #18

Error out waiting clients on close, handle too many connections #18

Conversation

schleyfox commented Oct 3, 2023

garrettheel left a comment

Choose a reason for hiding this comment

nathannotion Oct 3, 2023

Choose a reason for hiding this comment

schleyfox Oct 3, 2023

Choose a reason for hiding this comment

nathannotion Oct 3, 2023

Choose a reason for hiding this comment

schleyfox Oct 3, 2023

Choose a reason for hiding this comment

nathannotion left a comment

Choose a reason for hiding this comment

schleyfox commented Oct 3, 2023