Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close websocket when keepalive are no longer received #347

Merged
merged 2 commits into from
Nov 20, 2024

Conversation

fraukappayuwe
Copy link
Contributor

Hi,

I sometimes have network failures, and I noticed that when my connection is dropped, the websocket might not close itself, and then libsignal-service stop working forever.

It's a TCP behavior: when the server disconnect without sending a tcp reset (RST) (for example if your VPN is killed and you have kill-switch enabled), the client will not "know" that the server is not reachable anymore, and the tcp connection will remain open.

To fix this, we have different solutions, implement tcp_keepalive at OS or software level, or add a keepalive at application (websocket) level. I saw that you already implement Signal keepalives, but actually there is an issue with the implementation :

  • When the connection drop, the websocket is not automatically closed ;
  • libsignal-service continue to send keepalives in the ws, with success (although there is no longer any server listening) ;
  • we continue to receive on a "zombie" websocket so we don't receive new messages anymore ;
  • worse: when the connection comes back, libsignal-service continue to use this "zombie" ws, and will no longer be able to send nor receive new messages...

My proposition is to close the websocket if we have not received keepalive responses for more than 6 minutes: when there are more than 6 keepalive ids in the hashset outgoing_keep_alive_set

Thank you

@rubdos
Copy link
Member

rubdos commented Nov 19, 2024

Hey, thank you for tackling this! This has been an annoyance of literally every Whisperfish user (see e.g. https://forum.sailfishos.org/t/whisperfish-the-unofficial-sailfishos-signal-client/3337/1200?u=rubdos), and I haven't found the energy to start it myself yet.

My proposition is to close the websocket if we have not received keepalive responses for more than 6 minutes: when there are more than 6 keepalive ids in the hashset outgoing_keep_alive_set

I would honestly opt to make it more aggressive than that. I was thinking to dump the socket if a keep alive is not acknowledged within ~5 seconds or so, although that might be too aggressive. Six minutes is an eternity in DM-town. Maybe we can make it a configurable parameter, but I wouldn't way for a second or third, let alone a sixth failing KA.

I'm curious: what is your reasoning behind 6 KA's?

@rubdos
Copy link
Member

rubdos commented Nov 19, 2024

(don't bother with the nightly CI step that fails; that's on us)

@fraukappayuwe
Copy link
Contributor Author

I'm curious: what is your reasoning behind 6 KA's?

Well, the thing is I have a very poor network, and 5 seconds would be waaaay too aggressive for me, the ws would keep restarting all the time...

So why 6 KA ? I just wanted to kill the WS after 6 min of downtime, and it was a "quick win" (maybe dirty?) to implement it like that. imho, 1 minute would be a good trade-off btw 5 secs and 6 min, so we could implement it like that :

  • in the next KA interval tick (every 55 seconds), start by checking if there is 1 remaining KA in the hashset, and if so close the WS

But yes, if we want to make it more "generic", we may want to add a new check_interval.tick() => in the future::select! , so we can set a custom Duration for this interval

What's best for you

@fraukappayuwe
Copy link
Contributor Author

fraukappayuwe commented Nov 19, 2024

But if you really want a new configurable parameter, I can look into it (also, you can commit to this PR, if needed 👍 )

@rubdos
Copy link
Member

rubdos commented Nov 19, 2024

Okay, so a quick win in this case is to just kill the socket if more than one KA is in the set. That makes the set a very fancy Option, but we don't really need to care... it just means that in your logic, you can replace a 6 with a 0, and it's mergeable! Then, in the future we can make a configurable parameter. Does that sound agreeable to you?

@rubdos
Copy link
Member

rubdos commented Nov 19, 2024

It also makes semantic sense to pick 1 KA interval for killing the socket, I suppose... and we can make "smarter" algorithms in the future that take into account round trip of the previous packets, to create some kind of smart, adaptive, but more aggressive timeout.

@fraukappayuwe
Copy link
Contributor Author

fraukappayuwe commented Nov 19, 2024

it should be good now 👍

@rubdos rubdos merged commit e6affcc into whisperfish:main Nov 20, 2024
5 of 6 checks passed
@rubdos
Copy link
Member

rubdos commented Nov 20, 2024

Been testing this a bit on my phone, and I haven't had to restart Whisperfish since :-) Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants