-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests for NAT-Traversal and Hole-Punching #381
Conversation
bb1d042
to
54d8b7d
Compare
Some important things here in order to help debugging:
I'm going to check how to do a quick executable check in nodejs and also which exec sync you should be using. |
I think you should be using https://nodejs.org/api/child_process.html#child_processexecfilesyncfile-args-options when you run a synchronous command. And you want to use Then you want to make sure to catch the exception from it, and deal with it accordingly. |
To check if a command exists, we can do it in 2 ways:
if (!shell.which('git')) {
// ...
} The first case is nice so that we are not reliant on more and more packages. The second case is a more extendable if we need to deal with more platform constraints in the future like on windows. I reckon since Note that this should not be embedded into the |
Once it is running @emmacasolin please post a log of running the tests, change your log level to |
As I explained in-person, we're not using |
I think we can reuse This is just If you do that, you should put that in just the |
Then where we are using |
Task 6 was resolved by introducing conditional testing rather than separating the NAT tests. A describe/test can be skipped by using |
Looking good. It looks like we do need to add So we just need to add it to our gitlab runner and rebuild the image. Once gitlab runners upgrade their instances to use nftables, we will have to update gitlab runner, changing |
Proceed to recheck the tests @emmacasolin, and if all good, start squashing. |
NAT tests are passing on the pipeline now! Will proceed to start squashing |
438fa0b
to
08209b2
Compare
Can you create a new issue addressing the migration from iptables to nftables, and its dependencies:
Such an issue can be addressed in the future so we can use nft instead. |
}); | ||
}); | ||
test( | ||
'Node1 behind EIM NAT connects to Node2', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make your test descriptions lowercase, all of our test descriptions are lower case atm. And it should say node 1 behind ...
.
Also what is EIM? I'm not familiar with this acronym, perhaps it should be expanded?
tests/nat/noNAT.test.ts
Outdated
shell.which('iptables') && | ||
shell.which('nsenter') && | ||
shell.which('unshare'), | ||
'no NAT', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer calling this DMZ
, to indicate no NAT situations.
Why did you use "endpoint-independent" and "endpoint-dependent" terminology over using port-restricted and symmetric terms? Is it more precise name for these? |
@emmacasolin can you copy paste the test results from the all the tests and also annotate how they map to the spec's required test cases. I'm a bit confused how each test corresponds to the situations in the spec. Because it's so easy to forget how NAT works, I'm thinking we should add more documentation, or much more descriptive names to our tests (and structure it) so that it's clearer what each situation is testing. |
Yeah endpoint-independent mapping (EIM) and endpoint-dependent mapping (EDM) seem to be the preferred terms over port-restricted and symmetric. Port-restricted is a specific implementation of EIM and symmetric is a specific implementation of EDM, but there are other implementations so it's better to use the broader terms. |
And I reckon it'd be a good idea to draw some simple ASCII diagrams with asciiflow and just embed it in the tests themselves so we have a quick reference to each architecture being tested. |
It's obviously not a fix, but the "sweet spot" for the sleep time is 500ms - this is enough to stop the problem test from failing and not too much that other tests timeout. |
Based on some discussion, there are some architectural issues we need to review from ground up. However in order to merge this there are some quick fixes that we need to apply in this PR.
Long term issues (non-exhaustive) to be addressed in staging:
Either way we need to merge this to staging, and then work on the staging directly and divide and conquer each of these issues. |
Replacing the proxy address in a relay hole punch message with the address in your own node graph (if this exists) has gotten rid of all of the |
As for point 2, it doesn't look like there's any check for an existing open node connection when you attempt to ping another node, but there is a check for an existing proxy connection, so we just need to add a check for a node connection and then that should resolve that issue. I'm assuming here that if we have an open node connection with another agent, then that implies the agent we're connecting to is online? So if |
We don't need to check for an existing node connection. If there was an existing node connection then it would be using the proxy connection. So we only have to check for the proxy connection. |
In that case then nothing needs to be done for point 2, because we already check for an existing forward connection before establishing a new one in |
I set up a test mimicking the 3 node setup Emma has to check how the refresh buckets queue behaves. It seems to be working as intended. It will iterate over the higher buckets preforming a find node operation. In this case it will look like about 2 connections per bucket. sometimes that can be 8 or more buckets so we can end up with about 24 connections in the logs depending on how far apart the node Ids are. We need to add a way for the refresh bucket queue to detect that it's getting no new information due to the small network and skip doing refresh operations. This might need some thinking about and specifying out in a new issue. For theses tests we can just pause the refresh buckets queue. I may need to add a way to pause it though. Here is a log of the queue
|
I've verified by looking through the logs that each agent only creates one forward, reverse, and node connection per other agent. They will always check for the existence of a connection before creating a new one and won't create a new one if one exists for that address (or node Id in the case of node connections). Notably, the address fix seems to have solved the timeout issue - there aren't any other outstanding issues with the NAT tests, and they can run with the refresh buckets queue enabled. |
If an error was logged out by a service handler it would previously appear as `createClientService:`, which was too vague. The logger will now state the name of the handler.
The testnet PR brought in some changes that affect the NAT tests, so they needed to be modified slightly. - Adding a node to the node graph pings the node by default. We want to disable this in the NAT tests so we have more control over the pings. - Nodes add the details of any node that pings them. We can now remove some additional `nodes add` calls that were required to imitate this previously missing functionality.
Now that nodes add the details of a node that contacts them, we no longer need the `edmSimple` configuration to do this manually.
General linting, using capitals for constants in NAT utils, and lowercase test descriptions
A relay node for a hole punch message was previously not modifying the proxy address in the message (which is the "return address" used to contact the source node). For nodes behind a NAT, who do not know their own public address, they rely on this overwriting so that nodes do not try to contact them on their private, inaccessible address.
This will allow us to disable the queue for testing.
the composed flag was set at the beginning of the compose function causing another function to throw an error due to an undefined property if it was called at the same time.
764a10e
to
7c91ace
Compare
Description
This PR description was copied from #357
The architecture for our NAT busting will be completed in #326. Once this is done, we should be able to run fully simulated tests for communication between nodes using hole-punching across NAT. This will involve writing Jest tests that create namespaces with iptables rules to simulate different kinds of NAT and creating nodes inside of these namespaces. Specifically, we are only looking at endpoint-independent and endpoint-dependent NAT mapping, since our nat busting should be able to traverse all different types of firewalls. Thus, these tests will only be simulating port-restricted cone (endpoint-independent) and symmetric (endpoint-dependent) NAT.
The test cases that need to be considered here are:
These tests should cover all possible NAT combinations and should be repeated both with and without the use of a seed node as a signalling server (as such, some tests will be expected to fail, since the seed nodes are used in our NAT busting).
Relates to #148
Issues Fixed
Fixes NAT-Traversal Testing with testnet.polykey.io #159Tasks
[ ] 5. Write tests using a seed nodeTo be done in a new PRnpm run test
), and make sure they check that the operating system is Linux before running[ ] 7. Investigate no STDERR logs when running- resolved here: Tests for NAT-Traversal and Hole-Punching #357 (comment)pk agent status
orpk agent start
in the beginning Tests for NAT-Traversal and Hole-Punching #357 (comment) could be resolved in Testnet Deployment #326unshare
instead ofip netns
which reduces the overhead of usingsudo
and makes our tests more accessible in other platforms like CI/CD.[ ] 9. Authenticate the sender of hold-punch messages Authenticate the sender of a hole-punching signalling message #148- Not relevant to this PR, we will need to review the security of the P2P system after testnet deploymentFinal checklist