Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot connect with client after abrupt disconnection by killing process #4592

Open
1 of 4 tasks
aohotnik opened this issue Oct 4, 2024 · 6 comments
Open
1 of 4 tasks

Comments

@aohotnik
Copy link

aohotnik commented Oct 4, 2024

Describe the bug

I've got a small test client on .NET 8 on local dev machine that I use to connect to QUIC server (also .NET 8 based) on remote server host. If I kill the process of the client, i.e. by stopping debug I'm not able to reconnect to the server for long periods of time (5-15min). I installed wireshark and it appears that when client tries to connect and after it times out, there are zero packets on the network interface. As if the QUIC driver never passed them on to the network. Wireshark shows packet exchange when connection works.

Affected OS

  • Windows
  • Linux
  • macOS
  • Other (specify below)

Additional OS information

Client app: Windows 11 Home 10.0.22631
Server app: Windows Server Standard 2022 10.0.20348

MsQuic version

main

Steps taken to reproduce bug

  1. Run .NET 8 quic client, connect to the server.
  2. Wait a bit to exchange some data.
  3. Close the console without connection/stream cleanup.

Expected behavior

I would expect a new connection to be opened.

Actual outcome

Connection times out, no packets are generated on the network interface according to wireshark.

Additional details

The server is hosted on separate machine on another continent. When client and server is co-located on same machine I don't experience any issues. So it is only for remote connections. I tried replacing msquic.dll supplied with NET 8.0.8 runtime to the latest build (2.4.5), made sure the clients loaded it, but it didn't help. Disabling Windows Firewall didn't help. I'm not sure if Windows Defender on client machine can interfere here, that the only thing I didn't try removing.

output.txt

@ManickaP
Copy link
Member

ManickaP commented Oct 4, 2024

Hi, can you collect .NET logs (e.g. programmatically like this: https://gist.github.com/ManickaP/678f79592d30c616515f6f3588da8898) and also share the repro code?

@aohotnik
Copy link
Author

aohotnik commented Oct 4, 2024

Hi, can you collect .NET logs (e.g. programmatically like this: https://gist.github.com/ManickaP/678f79592d30c616515f6f3588da8898) and also share the repro code?

Hello @ManickaP

Here the output of log collection on the client side: https://github.com/user-attachments/files/17263224/output.txt

You can see that there are some payload events within 10 seconds when connection was supposed to be opening and its timing out, but these don't reach the read calls. If those are indeed the messages from the server, they are coming from some kind of a lingering buffer, cause there is zero activity on the wireshark during manifestation of this bug, by zero I mean nothing before connection attempt or after its timeout. They can only be coming from the driver buffers.

@ManickaP
Copy link
Member

ManickaP commented Oct 7, 2024

Can you share the repro code? Do you own the server? Is the server reachable? Can you connect to it with other QUIC clients? Could you try to collect msquic logs as well?

From the logs, the client tries to connect to the server, but the attempt times out on handshake timeout. Which seems more like sever problem, not client. But this is a guess, I don't have enough info yet to 100% say, what the cause is.

@aohotnik
Copy link
Author

aohotnik commented Oct 7, 2024

@ManickaP Yes, server is reachable, cause connections work after a while, but I can make one or a few more before issue manifests again. We also have one client on MacOS using Apple's QUIC implementation and that one works without any problem. What you see in the logs is not occurring on the network, as I mentioned before there is zero activity on the network interface when this happens. When connection works I see plenty of packets as expected.

Not sure why it didn't collect enough logs with that code you gave me, if you point me to the instructions I could gather more. I don't know how to make a repro code for a msquic based c# client which needs to be killed a few times while communicating with remote msquic based server.

@ManickaP
Copy link
Member

ManickaP commented Oct 7, 2024

I don't know how to make a repro code for a msquic based c# client

Just share your code that causes you the trouble.

Do you own the server?

Could you try to collect msquic logs as well?

See https://github.com/microsoft/msquic/blob/main/docs/Diagnostics.md

What you see in the logs is not occurring on the network

That is expected, the 10 seconds is an internal timeout for handshake.

zero activity on the network interface when this happens

Not even outgoing packets? That would mean something fishy with your environment.

@aohotnik
Copy link
Author

aohotnik commented Oct 7, 2024

Since there were no packets on the network interface before connection attempt, in between and after, I started digging around network interface and any software that could prevent app from reaching it. I noticed that changing settings of network interface caused it's reload which allowed to at least one connection to go through before timeouts started occurring again. So I took out windows defender completely from the windows home machine running the client. Connections were easier to establish now but still getting blocked for long intervals. Than I noticed that TCP Optimizer reported a second NIC device which was Microsoft's USB wireless stick, but the actual device was not registered in device manager and wasn't plugged in. So I plugged it in and uninstalled it along with driver. Connections are now smooth. Seems like all three Microsoft's products were cancelling each other out, cause up to the moment I started development using msquic I never had any issues on this machine with that USB stick plugged in or without.

So I have no idea what to do with this and which product is leading to this problem. I'm happy to be unblocked. I managed to spin up about 85 clients, little cpu usage tons of free RAM, but all of their processes die at the same time when when a new instance spins up in the count close to 90. I can't spend any more time figuring out which microsoft's toy is doing it this time, but if I obtain enough evidence it is actually msquic I'll report it separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants