Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qca-nss-ecm: causes some network problems #6

Open
NukeMania opened this issue Apr 12, 2023 · 6 comments
Open

qca-nss-ecm: causes some network problems #6

NukeMania opened this issue Apr 12, 2023 · 6 comments

Comments

@NukeMania
Copy link

Hi,
after your latest commits i can't download some files and can't git clone some repos
for example
wget https://downloads.sourceforge.net/lzmautils/xz-5.4.2.tar.bz2 ( download hit to %99 than goes to endless loop)
https://github.com/bitthief/openwrt.git when i try cloning your repo stucks at some random number of percent

workaround : /etc/init.d/qca-nss-ecm stop than everything works fine

last working version : OpenWrt SNAPSHOT r22494-49348d7178 / LuCI Master git-23.074.82619-6ad6a24
build date: 31.03.2023 / kernel: 5.15.104

@derekw36
Copy link

I had similar issues and solved it by disabling TCP Segmentation Offload on all ethernet ports. Try putting this in your /etc/rc.local (or via Luci System->Startup->Local Startup:

/usr/sbin/ethtool -K wan tso off
/usr/sbin/ethtool -K lan1 tso off
/usr/sbin/ethtool -K lan2 tso off
/usr/sbin/ethtool -K lan3 tso off
/usr/sbin/ethtool -K lan4 tso off

and reenable qca-nss-ecm, which gives you back NSS offloading.

@NukeMania
Copy link
Author

NukeMania commented Apr 22, 2023

@derekw36 thanks for tip
i am no longer using bitthief's sources my current build from AgustinLorenzo's repo but i did found enabling software flow offloading fixed my issue
also i will test your suggestion with bitthief's sources

edit: your suggestion didn't work

@bitthief
Copy link
Owner

bitthief commented May 8, 2023

Hi guys,

Thanks for reporting this, actually it's a much older issue that I've been tracking down for a while.
This is also why I started the NSS fork in the first place, ironically, hoping ECM would fix it. You can find some of my posts on the AX3600 thread from like 2 years ago about this..

I first saw these SSL/TLS corruption issues months ago on the 5.15 codebase, only happening on the wired / Ethernet ports of course (why does Wi-Fi work - is it because it's not offloaded by NSS?):

+ curl -f --connect-timeout 20 --retry 5 --location https://cdn.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/v1.47.0/e2fsprogs-1.47.0.tar.xz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 37 6893k   37 2575k    0     0  2827k      0  0:00:02 --:--:--  0:00:02 2827k
curl: (56) OpenSSL SSL_read: OpenSSL/1.1.1t: error:1408F119:SSL routines:ssl3_get_record:decryption failed or bad record mac, errno 0
curl -sSL https://mirrors.dotsrc.org/tails/stable/tails-amd64-5.10/tails-amd64-5.10.img -O --retry 100 --retry-delay 2 --retry-all-errors --http1.1 -v -C -
*   Trying 130.225.254.116:443...
* Connected to mirrors.dotsrc.org (130.225.254.116) port 443 (#0)
* schannel: disabled automatic use of client certificate
* ALPN: offers http/1.1
* ALPN: server accepted http/1.1
> GET /tails/stable/tails-amd64-5.10/tails-amd64-5.10.img HTTP/1.1
> Host: mirrors.dotsrc.org
> Range: bytes=6307127-
> User-Agent: curl/7.83.1
> Accept: */*
>
* schannel: failed to decrypt data, need more data
* schannel: failed to decrypt data, need more data
* schannel: failed to decrypt data, need more data
* schannel: failed to decrypt data, need more data
* schannel: failed to decrypt data, need more data
* schannel: failed to decrypt data, need more data
* Mark bundle as not supporting multiuse
< HTTP/1.1 206 Partial Content
< Server: nginx/1.18.0 (Ubuntu)
< Date: Tue, 21 Feb 2023 23:54:11 GMT
< Content-Type: application/octet-stream
< Content-Length: 1335870153
< Last-Modified: Wed, 15 Feb 2023 10:20:14 GMT
< Connection: keep-alive
< ETag: "63ecb1de-50000000"
< X-Frame-Options: SAMEORIGIN
< Referrer-Policy: strict-origin
< Content-Range: bytes 6307127-1342177279/1342177280
<
{ [16006 bytes data]
* schannel: failed to decrypt data, need more data
* schannel: failed to decrypt data, need more data
* schannel: failed to decrypt data, need more data
* schannel: failed to decrypt data, need more data
* schannel: failed to decrypt data, need more data
* schannel: failed to decrypt data, need more data
* schannel: failed to decrypt data, need more data
{ [16384 bytes data]
* schannel: failed to decrypt data, need more data
* schannel: failed to decrypt data, need more data
* schannel: failed to decrypt data, need more data
* schannel: failed to decrypt data, need more data
* schannel: failed to decrypt data, need more data
* schannel: failed to decrypt data, need more data
{ [32768 bytes data]
* schannel: failed to decrypt data, need more data
* schannel: failed to decrypt data, need more data
{ [32768 bytes data]
* schannel: failed to decrypt data, need more data
{ [98304 bytes data]
* schannel: failed to decrypt data, need more data
{ [32768 bytes data]
* schannel: failed to decrypt data, need more data
{ [16384 bytes data]
* schannel: failed to decrypt data, need more data
{ [81920 bytes data]
* schannel: failed to decrypt data, need more data
{ [81920 bytes data]
* schannel: failed to decrypt data, need more data
{ [49152 bytes data]
* schannel: failed to decrypt data, need more data
* schannel: failed to decrypt data, need more data
* schannel: failed to read data from server: SEC_E_DECRYPT_FAILURE (0x80090330) - The specified data could not be decrypted.
* Closing connection 0
* schannel: shutting down SSL/TLS connection with mirrors.dotsrc.org port 443
curl: (56) Failure when receiving data from the peer

I've been debugging it behind the scenes with @AgustinLorenzo's help, that's why we recently bumped all the QCA NSS modules (ECM, SSDK, DP etc.) to 12.3r2, hoping it'd fix it. Software offloading doesn't seem to influence it at all, I've tried many different scenarios.

So far, my findings indicate an issue with the MTU configured for the wired interfaces, as you can see in the debug logs I pasted above, bytes are overlapping. I also had this behaviour on the stock firmware when using VLANs / bridged networks, I had to lower the MTU to 1492 to fix it, both on the client and the router interface. It might be related to NSS DP or the internal switch also, something hardcoded? Could also depend on your client OS network config.

I also use WireGuard tunnels (with MTUs != 1500) and mwan3 / policy-based routing, which could interfere. I just pushed openwrt/packages#20923 in my own local packages repo and also a new version of openwrt/openwrt#12112

I'll test these changes and debug more. If you find a more robust fix, please do post it here!

@NukeMania
Copy link
Author

@bitthief I got same problem but different results I don't get errors with curl it just stuck in the end
probably every network has to own configuration and effects to everyone different way

first time i did face the problem than my first indicate was mtu too
because my isp require 1492 mtu size for pppoe otherwise packages drop or loose

my guess mtu value causes problem the different way or mss acting weird when nss offloading enabled
i played with mtu in entire network so client and server side I set mtu size to 1492 in every device it didn't work
for now my solution is enabling software offloading than things back to normal

but if i can debug nss i can help because my problem started with this 8b673c
previous builds didn't affect to me and i archived them i can debug if any debug available off course
we can compare those builds

also i captured network traffic i noticed in wireshark there are many "tcp segment of a reassembled pdu" with length 1462

@bitthief
Copy link
Owner

So I had some time to play with this and I have a partial hack/workaround for now.

@derekw36 was definitely on the right track, the bug appears to be triggered by the interface offload (TCP? GSO? others?).

I have been using a script to disable ALL the offloads for all the interfaces and it fixes the issue instantly. No reboots necessary or even restarting ECM etc.
No idea precisely which offload is causing the issue and for which interface (or bridge?), but this works for now and I can live with it. If you guys want to investigate more and figure out exactly which offload is causing it, we can definitely apply a more elegant solution.

Please look at commit e4447f3

Just running the disable_offloads.sh script should be enough once the device is booted (or just place it in /etc/rc.local with a sleep before).

I also wrote a hotplug script, but it doesn't appear to trigger on my device, I think I missed something there and it needs to be debugged (this would allow automatic activation, controlled by options in the ECM config file.

More than happy to look at any other suggestions / ideas / fixes etc. related to this.

@NukeMania
Copy link
Author

NukeMania commented May 31, 2023

@bitthief
i think i did found the problem or at least my version of problem has been fixed i don't have to enable software offloading anymore
i was messing with stock firmware than i did notice custom sysctl net.netfilter.nf_conntrack_tcp_no_window_check
i did rebuild firmware with patch than everything back to normal without software offloading

at the beginning i didn't tell i was using some custom sysctl variables so those settings may affect the problem maybe software offloading did the trick after all those
especially "net.ipv4.tcp_window_scaling=1" or "net.ipv4.tcp_tw_reuse=1"

you may wanna look into or anyone can test with

sysctl -w net.core.rmem_default=256960
sysctl -w net.core.rmem_max=513920
sysctl -w net.core.wmem_default=256960
sysctl -w net.core.wmem_max=513920
sysctl -w net.core.netdev_max_backlog=2000
sysctl -w net.core.somaxconn=2048
sysctl -w net.core.optmem_max=81920
sysctl -w net.ipv4.tcp_mem="131072 262144 524288"
sysctl -w net.ipv4.tcp_rmem="8760 256960 4088000"
sysctl -w net.ipv4.tcp_wmem="8760 256960 4088000"
sysctl -w net.ipv4.tcp_keepalive_intvl=30
sysctl -w net.ipv4.tcp_keepalive_probes=3
sysctl -w net.ipv4.tcp_window_scaling=1
sysctl -w net.ipv4.tcp_tw_reuse=1
sysctl -w net.ipv4.tcp_max_syn_backlog=2048
sysctl -w net.ipv4.tcp_fastopen=3
sysctl -w net.ipv4.tcp_low_latency=1
sysctl -w net.ipv4.tcp_mtu_probing=1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants