Close websocket when keepalive are no longer received #347

fraukappayuwe · 2024-11-19T11:50:59Z

Hi,

I sometimes have network failures, and I noticed that when my connection is dropped, the websocket might not close itself, and then libsignal-service stop working forever.

It's a TCP behavior: when the server disconnect without sending a tcp reset (RST) (for example if your VPN is killed and you have kill-switch enabled), the client will not "know" that the server is not reachable anymore, and the tcp connection will remain open.

To fix this, we have different solutions, implement tcp_keepalive at OS or software level, or add a keepalive at application (websocket) level. I saw that you already implement Signal keepalives, but actually there is an issue with the implementation :

When the connection drop, the websocket is not automatically closed ;
libsignal-service continue to send keepalives in the ws, with success (although there is no longer any server listening) ;
we continue to receive on a "zombie" websocket so we don't receive new messages anymore ;
worse: when the connection comes back, libsignal-service continue to use this "zombie" ws, and will no longer be able to send nor receive new messages...

My proposition is to close the websocket if we have not received keepalive responses for more than 6 minutes: when there are more than 6 keepalive ids in the hashset outgoing_keep_alive_set

Thank you

rubdos · 2024-11-19T15:03:13Z

Hey, thank you for tackling this! This has been an annoyance of literally every Whisperfish user (see e.g. https://forum.sailfishos.org/t/whisperfish-the-unofficial-sailfishos-signal-client/3337/1200?u=rubdos), and I haven't found the energy to start it myself yet.

My proposition is to close the websocket if we have not received keepalive responses for more than 6 minutes: when there are more than 6 keepalive ids in the hashset outgoing_keep_alive_set

I would honestly opt to make it more aggressive than that. I was thinking to dump the socket if a keep alive is not acknowledged within ~5 seconds or so, although that might be too aggressive. Six minutes is an eternity in DM-town. Maybe we can make it a configurable parameter, but I wouldn't way for a second or third, let alone a sixth failing KA.

I'm curious: what is your reasoning behind 6 KA's?

rubdos · 2024-11-19T15:10:53Z

(don't bother with the nightly CI step that fails; that's on us)

fraukappayuwe · 2024-11-19T15:17:16Z

I'm curious: what is your reasoning behind 6 KA's?

Well, the thing is I have a very poor network, and 5 seconds would be waaaay too aggressive for me, the ws would keep restarting all the time...

So why 6 KA ? I just wanted to kill the WS after 6 min of downtime, and it was a "quick win" (maybe dirty?) to implement it like that. imho, 1 minute would be a good trade-off btw 5 secs and 6 min, so we could implement it like that :

in the next KA interval tick (every 55 seconds), start by checking if there is 1 remaining KA in the hashset, and if so close the WS

But yes, if we want to make it more "generic", we may want to add a new check_interval.tick() => in the future::select! , so we can set a custom Duration for this interval

What's best for you

fraukappayuwe · 2024-11-19T15:28:05Z

But if you really want a new configurable parameter, I can look into it (also, you can commit to this PR, if needed 👍 )

rubdos · 2024-11-19T15:46:46Z

Okay, so a quick win in this case is to just kill the socket if more than one KA is in the set. That makes the set a very fancy Option, but we don't really need to care... it just means that in your logic, you can replace a 6 with a 0, and it's mergeable! Then, in the future we can make a configurable parameter. Does that sound agreeable to you?

rubdos · 2024-11-19T15:48:26Z

It also makes semantic sense to pick 1 KA interval for killing the socket, I suppose... and we can make "smarter" algorithms in the future that take into account round trip of the previous packets, to create some kind of smart, adaptive, but more aggressive timeout.

fraukappayuwe · 2024-11-19T16:00:15Z

it should be good now 👍

rubdos · 2024-11-20T10:24:36Z

Been testing this a bit on my phone, and I haven't had to restart Whisperfish since :-) Thanks!

close websocket when keepalive are no longer received

9502d4c

restart websocket at first missing keepalive

e73644b

gferon approved these changes Nov 19, 2024

View reviewed changes

rubdos enabled auto-merge November 20, 2024 07:15

rubdos merged commit e6affcc into whisperfish:main Nov 20, 2024
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Close websocket when keepalive are no longer received #347

Close websocket when keepalive are no longer received #347

fraukappayuwe commented Nov 19, 2024

rubdos commented Nov 19, 2024 •

edited

Loading

rubdos commented Nov 19, 2024

fraukappayuwe commented Nov 19, 2024

fraukappayuwe commented Nov 19, 2024 •

edited

Loading

rubdos commented Nov 19, 2024

rubdos commented Nov 19, 2024 •

edited

Loading

fraukappayuwe commented Nov 19, 2024 •

edited

Loading

rubdos commented Nov 20, 2024

Close websocket when keepalive are no longer received #347

Close websocket when keepalive are no longer received #347

Conversation

fraukappayuwe commented Nov 19, 2024

rubdos commented Nov 19, 2024 • edited Loading

rubdos commented Nov 19, 2024

fraukappayuwe commented Nov 19, 2024

fraukappayuwe commented Nov 19, 2024 • edited Loading

rubdos commented Nov 19, 2024

rubdos commented Nov 19, 2024 • edited Loading

fraukappayuwe commented Nov 19, 2024 • edited Loading

rubdos commented Nov 20, 2024

rubdos commented Nov 19, 2024 •

edited

Loading

fraukappayuwe commented Nov 19, 2024 •

edited

Loading

rubdos commented Nov 19, 2024 •

edited

Loading

fraukappayuwe commented Nov 19, 2024 •

edited

Loading