Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NAT Signalling (Hole Punching) should be Fire & Forget, Coalesced, and Secured with Signatures #555

Merged
merged 1 commit into from
Oct 24, 2023

Conversation

tegefaulkes
Copy link
Contributor

@tegefaulkes tegefaulkes commented Aug 21, 2023

Description

Currently the way NAT signalling is implemented is pretty simplistic. Some optimisations and security improvements needs to be made. There are 3 main improvements that needs to be made.

  1. fire and forget
  2. Signal attempt coalescing.
  3. Secure signal requests
fire and forget

Currently, when a signalling request is made, the RPC call remains active for the duration of the signalling request. This forms a chain of active RPC requests between the requester and the relay, as well as between the relay and the target. We want to refactor this so that the RPC request is only active while initiating the relay request.

To this end, signalling and hole punching attempts need to be back-grounded. any RPC request to relay a signal needs to return immediately and attempt the signal relay in the background.

In the same vein, when the target receives the request, it needs to return and do the hole punch attempts in the background.

Signal attempt coalescing

If multiple requests are made for the same hole punch attempt, then these attempts need to be coalesced together. This can be applied to both the signalling and hole punching attempts.

For each stage we need to track the signalling and punching attempts in a map. The map can be uniquely keyed with the nodeIds of the requester and requestee node concatinated together as a string. If the domain is shut down then these need to be cancelled.

Secure signal requests

Related: #148

We need a way to securely verify that the information of the request is correct and that the target of the hole punch messages made the request in the first place. There has been some discussion on this in #148.

Issues Fixed

Tasks

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@tegefaulkes tegefaulkes self-assigned this Aug 21, 2023
@ghost
Copy link

ghost commented Aug 21, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

@tegefaulkes
Copy link
Contributor Author

I'm thinking of breaking up the NodesHolePunchMessageSendHandler into two separate handlers. A signal request and a hole punch request. Right now both kinds of request are done in the same handler. The action it takes switches based on the message contents.

Right now these requests are only made through a single hop on a seed node. Later on with the #182 plan a hole punch will only be done with a single hop. So splitting the logic into two kinds of requests should simplify things.

@tegefaulkes
Copy link
Contributor Author

Mostly done, Just need to fix some tests and then work on the authentication logic. I think this may need to be speced out a little more.

@tegefaulkes
Copy link
Contributor Author

Looking into authentication now.

@tegefaulkes
Copy link
Contributor Author

New issue at #556 for proof of work limiting.

@tegefaulkes
Copy link
Contributor Author

Pretty much done now, I'm going to do a review.

Copy link
Contributor Author

@tegefaulkes tegefaulkes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General review for reference. Just added some notes for changes.

src/agent/handlers/nodesHolePunchRequest.ts Outdated Show resolved Hide resolved
src/agent/handlers/serverManifest.ts Outdated Show resolved Hide resolved
src/nodes/NodeConnectionManager.ts Outdated Show resolved Hide resolved
src/nodes/NodeConnectionManager.ts Outdated Show resolved Hide resolved
src/nodes/NodeConnectionManager.ts Outdated Show resolved Hide resolved
src/nodes/NodeManager.ts Outdated Show resolved Hide resolved
src/utils/RateLimiter.ts Outdated Show resolved Hide resolved
tests/agent/handlers/nodesHolePunchRequest.test.ts Outdated Show resolved Hide resolved
tests/agent/handlers/nodesHolePunchSignal.test.ts Outdated Show resolved Hide resolved
@tegefaulkes
Copy link
Contributor Author

There are some tmp errors that need to be cleaned up.

@tegefaulkes tegefaulkes force-pushed the feature-nat-signalling branch from adaa1fe to 32066fa Compare August 25, 2023 06:50
@tegefaulkes
Copy link
Contributor Author

Pending CI and merge. I'm considering this done now.

I'll move on to the configuration PR next. Then MDNS, node graph changes and final codebase cleanup. From there its testnet testing time.

@CMCDragonkai
Copy link
Member

Tests pass but there's a resource leak in PK. This is probably the problem that occurred before. Resource leaks is a critical problem you need to have a look into this at a deeper level.

image

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Aug 26, 2023

I think this PR requires more work.

  1. I don't understand how your fire and forget is achieved here. Where is this background processing happening if even required?
  2. I think the resource leak needs to solved before committing any more code to staging.
  3. I don't see any authentication happening right now so Authenticate the sender of a hole-punching signalling message #148 is not being solved.
  4. Make the rate limiter a directory with 2 files so it can be used generically.

@tegefaulkes tegefaulkes changed the base branch from staging to feature-config-defaults August 28, 2023 03:54
@CMCDragonkai
Copy link
Member

CMCDragonkai commented Aug 28, 2023

Names of the 2 separate handlers:

  • nodesConnectionSignalingInitial
  • nodesConnectionSignalingFinal

Best names we could come up for now, and fits the SOV order.

@CMCDragonkai
Copy link
Member

Changed to:

  • nodesConnectionSignalInitial
  • nodesConnectionSignalFinal

More verby.

@tegefaulkes
Copy link
Contributor Author

I just need to fix one of the tests files not exiting the process. Besides that the rest that needs to be done like the message signing, authentication and handling background promises can be addressed later. We're putting this PR on hold until we work some things out.

@tegefaulkes
Copy link
Contributor Author

Ok, this PR is on hold for now.

@tegefaulkes tegefaulkes force-pushed the feature-config-defaults branch from debf777 to 44609b9 Compare October 4, 2023 04:44
@CMCDragonkai
Copy link
Member

This needs to be rebased and work should start on this.

I'm getting wary of all the stop-gap solutions that are introducing resource leaks and memory leaks here. I think it's important any stop-gap solution isn't resulting a resource leak for quic and ws.

Also here too if there are background processes running. Make sure if you're not using the tasks domain that you have proper tracking of these promises plus signal cancellation.

We should make sure to have tests that test the signal cancellations/abrupt aborts of the code.

@CMCDragonkai
Copy link
Member

Needs rebase. What's the status of this?

@tegefaulkes tegefaulkes force-pushed the feature-nat-signalling branch from 485e506 to 8ccacd3 Compare October 23, 2023 04:57
@tegefaulkes tegefaulkes changed the base branch from feature-config-defaults to staging October 23, 2023 04:57
@tegefaulkes
Copy link
Contributor Author

Rebased on staging now. It was tricky given the number of changes to staging since this was worked on. I had to branch off of staging and pull the changes needed from a diff.

I haven't fully tested it yet and it likely still needs some polish.

Currently it isn't actually doing any authentication because at the time I concluded that it's too easy to work around if you're a bad actor. But if we still want that I need to apply signing and verification to the requests.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Oct 23, 2023

Currently it isn't actually doing any authentication because at the time I concluded that it's too easy to work around if you're a bad actor. But if we still want that I need to apply signing and verification to the requests.

We still need auditability here so you should add in the signed requests. The signed requests is a quick add, and should be important for our metrics and observability.

@CMCDragonkai CMCDragonkai changed the title NAT signalling refactor NAT Signalling (Hole Punching) should be Fire & Forget, Coalesced, and Secured with Signatures Oct 23, 2023
@CMCDragonkai CMCDragonkai mentioned this pull request Oct 23, 2023
14 tasks
@tegefaulkes
Copy link
Contributor Author

Finished cleaning up and testing. I'll add signing and verification now.

@tegefaulkes tegefaulkes force-pushed the feature-nat-signalling branch from a7f4c78 to 9cd245a Compare October 24, 2023 02:48
@tegefaulkes
Copy link
Contributor Author

Everything is done now. I need to go over the existing review comments to see if they're still relevant.

@tegefaulkes
Copy link
Contributor Author

Ok, this is ready to merge.

@CMCDragonkai
Copy link
Member

We need an explanation preferably a diagram on how all of the rate limiting, semaphores, active promise tracking actually work together, this code is quite tricky and will easily tech debt if we don't understand it.

* general refactor of the signalling protocol.
* Added signatures and verification to the signalling requests and relay messages. #148
@tegefaulkes tegefaulkes force-pushed the feature-nat-signalling branch from 9ebf9a7 to 6123329 Compare October 24, 2023 05:37
@tegefaulkes tegefaulkes merged commit 479f8b6 into staging Oct 24, 2023
@CMCDragonkai
Copy link
Member

@tegefaulkes diagram?

@tegefaulkes
Copy link
Contributor Author

I've been meaning to get to this. Just been side-tracked with other issues at the moment.

@CMCDragonkai CMCDragonkai added the r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices label Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices
Development

Successfully merging this pull request may close these issues.

Authenticate the sender of a hole-punching signalling message
2 participants