Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidating Configuration Entropy and Nodes and Client Services Refactoring #552

Merged
merged 65 commits into from
Oct 4, 2023

Conversation

CMCDragonkai
Copy link
Member

@CMCDragonkai CMCDragonkai commented Aug 16, 2023

Description

The configuration of Polykey has suffered from alot of entropy. This PR attempts to consolidate our configuration structure.

It's based on this comment: MatrixAI/Polykey-CLI#4 (comment)

A note about task 5. It's important to understand that in PolykeyAgent, we have a bunch of dependencies that can be injected, but their injection is really intended for mocking, which is only for testing, not intended for production usage. This is different from optional dependencies which has legitimate usecases when injecting or not injecting. If we were to follow task 5. strictly, this would add alot of boilerplate to managing optional injected dependencies for everything in PolykeyAgent. And right now, there's no easy way to manage this without manually setting up booleans and conditional checks for all these optional dependencies. Therefore... task 5. should only be applied to situations where injecting or not-injecting is a legitimate usecase such as between QUICSocket and QUICServer and QUICClient and between EncryptedFS and DB. But optional dependencies in PolykeyAgent are the exception. IN FACT, because JS allows module mocking... we may even remove the ability to inject these dependencies and use jest.mock instead.

Issues Fixed

Tasks

  • 1. Move *base to config.paths. As these are path constants. They are not defaults to be configured.
  • 2. Separate "user config" from "system config". There is a bunch of constants that control how the network works, these are not supposed to be changed by the user. These are constants that we developers of Polykey tune empirically for optimal operation. Remember at this point of the abstraction, PK is an application, no longer a library.
    • config.defaultsUser
    • config.defaultsSystem
  • 3. Create a type based on config.defaultsUser and have it be referenced everywhere there needs to be some spread of the parameters that will be defaulted by config.defaultsUser. This prevents entropy where parameter names get changed and we forget that they need to be consistent.
    • What happens if you only use a subset of the default parameters? Do you end up creating a subtype?
    • All relevant types should be exposed in src/config.ts so that way it is always consistent, even if it is a bit weird. It gives us one place to ensure compliance between parameters that should have the same name and same meaning.
  • 4. Restructure configuration properties into a special config parameter, and do a deep merge
    • PolykeyAgent.createPolykeyAgent - create a PolykeyAgentConfig type
    • bootstrapState - also needs to use BootstrapConfig, but it's a far simpler type
    • Fix up NodeConnectionManager config
    • Fix up RPCServer config
    • Fix up Websocket config
  • 5. Create issue for injected optional dependencies not having their lifecycle managed by the target system. This was changed for EFS. It should also be the case in PK. This means only encapsulated dependencies are managed. This does impact alot of tests.
  • 6. Moved all QUIC client and server crypto into NodeConnectionManager and convert to using keys/utils/symmetric utilities rather than the HMAC SHA256.
  • 7. Move QUICSocket lifecycle into NodeConnectionManager. All of the QUIC transport will be now encapsulated within nodes domain.
  • [ ] 8. Move all of RPC and websocket into the client domain, so that way both transports are encapsulated in their respective domain. - this is done in Consolidating ClientService and js-ws and js-rpc integration #560
  • 9. Flesh out all the connection and stream error codes for the application of the QUIC connection and streams to the node usage.
  • 10. Agent handlers should be moved into nodes domain if client domain contains handlers. We end up having "client service" and "node service", respectively meaning serving clients and serving nodes.
  • 11. Ensure that webcrypto polyfill has the correct Crypto type due to changes in Node's interface. Do this by using as Crypto.
  • 12. The NodeConnectionManager and NodeConnection have both pull and push flows.

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@CMCDragonkai CMCDragonkai changed the title Feature config defaults Consolidating Configuration Entropy Aug 16, 2023
@ghost
Copy link

ghost commented Aug 16, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

@CMCDragonkai CMCDragonkai mentioned this pull request Aug 16, 2023
11 tasks
@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Aug 16, 2023

This PR is focused on have a consistent user-friendly "application-level" configuration of PK.

The fact that upstream libraries/domains may require a specific naming that doesn't match is not relevant for this. Upstream can change their naming (like aligning js-ws and js-quic), or PK can just map the relevant values.

@CMCDragonkai CMCDragonkai self-assigned this Aug 18, 2023
@CMCDragonkai CMCDragonkai force-pushed the feature-config-defaults branch 3 times, most recently from 66dc5c3 to 3064e24 Compare August 18, 2023 12:15
@CMCDragonkai
Copy link
Member Author

@tegefaulkes there are TODO comments left in code. These should not be left here, they must all be raised to issues. If you're in the midst of coding, I suspect keeping track as PR tasks can work too.

// TODO: finish off agent migration work after CLI migration is done.
// TODO: check all locking and add cancellation for it.

Removing these from NodeConnectionManager.

@CMCDragonkai
Copy link
Member Author

Rebased on staging, now to continue.

@CMCDragonkai
Copy link
Member Author

All paths that was in the defaults is now in:

config.paths
  /**
   * File/directory paths
   */
  paths: {
    statusBase: 'status.json',
    statusLockBase: 'status.lock',
    stateBase: 'state',
    stateVersionBase: 'version',
    dbBase: 'db',
    keysBase: 'keys',
    vaultsBase: 'vaults',
    efsBase: 'efs',
    tokenBase: 'token',
  },

These are basically application constants, they are not really user config, no need to put in defaults anywhere.

@CMCDragonkai
Copy link
Member Author

There are a bunch of system parameters. That I'm moving to config.defaultSystem.

These parameters are tuned by the developers. They are not to be specified by the user. We are only raising it to the top level to have one place acting as the main constant specification.

The naming of these parameters should be consistent and should have consistent meaning. And I'm already finding some strange aspects of these that I need you input on @tegefaulkes.

Firstly for agent related networking. I'm noticing that there's overlap between node connection related parameters and the agent transport parameters. I've currently got them named like this in this PR:

  • nodesConnectionConnectTime
  • nodesConnectionTimeoutTime
  • nodesConnectionHolePunchTimeoutTime
  • nodesPingTimeoutTime
  • agentConnectionKeepAliveIntervalTime
  • agentConnectionMaxIdleTimeoutTime

Firstly right now the entire agent transport is being encapsulated inside the nodes domain. This tells me that we should be abstracting the configuration of the agent transport entirely to the nodes domain.

Secondly, isn't there overlapping behaviour here. We have a connect time and timeout time for node connections. But doesn't this end up encapsulating the entire agent connection keep alive interval time and max idle timeout time?

Let's say nodesConnectionConnectTime is 2000. That means 2000ms is the timeout to create a NodeConnection. This is being used by the @timedCancellable decorator used in:

  • NodeConnectionManager.withConnF
  • NodeConnectionManager.getConnection
  • NodeConnectionManager.getConnectionWithAddress
  • NodeConnectionManager.getRemoteNodeClosestNodes
  • NodeConnectionManager.sendSignalingMessage
  • NodeConnectionManager.relaySignalingMessage

In all these cases, this 2000ms is used across all the asynchronous operations and finally fed into NodeConnection.createNodeConnection. Which itself uses QUICClient.createQUICClient.

So in terms of timing hierarchy:

  • node connection time - is the entire time allowed to "establish a connection" (however the whole thing is being used by default for a variety of operations that may end up establishing a connection)
  • agent connection max idle timeout time - this actually controls the dialling time allowed, as well as how much time is available during idleness at the agent connection

Right now the max idle time is set to 60000ms. While the node connection connect time is 2000ms. That actually means the total setup time is limited to 2000ms right?

Then there's also the nodesConnectionTimeoutTime which is 60000ms. It governs how long a node connection will be kept around until it times out, unless there's activity. This activity is specific to node connections right? However this is also limited by the agentConnectionMaxIdleTimeoutTime.

  • nodesConnectionTimeoutTime - 60000
  • agentConnectionMaxIdleTimeoutTime - 60000

Then this means, that after 60s without packet activity, the transport connection will be killed. But even if there is 60s of packet activity, if there is no "node" activity on a node connection, after 60s, the node connection will be killed. This means that as long as the keep alive is running, the underlying agent connection will continue running, but if no node-related operations are running for 60s, then the node connection is terminated.

Then there's also the nodesPingTimeoutTime which I don't understand how it relates to the connection time.

So there are some overlaps and also specific constraints. Let's see if we can simplify all of this.

  • The node connection manager has a bunch of async operations that all require some form of timeout applied. The default however can be just be infinite to allow the caller to decide when to stop doing things. It may not actually be necessary to encode defaults into the creation of NodeConnectionManager. I may be wrong here because I haven't had time to fully review the NCM. The point is that if I run a particular command, I have the responsibility of cancelling it, and we can bubble up this requirement all the way to the top.
  • What is the top? Who calls NCM? It's mostly 2 things: PolykeyAgent during bootstrapping and RPC calls that trigger node connections to form to do agent to agent calls. There is internal recursive calls that occur due to finding nodes, and maybe the NodeGraph requires regular updates.
  • However there should definitely be "idle timeouts" and "keepalive" mechanisms applied. These should indeed by defaults applied to the NCM. In fact, in these cases, this would be the main way in which you configure this. Most method calls would have no need to be able to override these 2 configuration parameters.

If this is true, then we can:

  1. Remove any usages of default timeouts in the ctx being used by method calls, as NCM by itself would delegate this to the caller. And the caller can delegate this upwards. Therefore connection startup time could technically take forever. I argue that would mean an infinite timeout, which is actually the default value of quiche (a default value of 0 means infinite right?). I think we talked about this before @tegefaulkes but why didn't end up simply selecting 0 or infinite max timeout for QUIC as the default?
  2. The 2 things that does require default configuration would be keep alive mechanisms and idle timeout mechanisms. To start it's important to differentiate "idle timeout" from startup timeout. In 1., we are talking about startup timeout because that's applied to the ctx of operations. By default it should be infinite until the caller decides it has taken enough time. Here an idle timeout only applies once the connection has started.

Now I do remember we talked about this. I was asking why not set the underlying connection to have an infinite idle timeout, then rely on the NodeConnection to manage the startup timeout and the idle timeout separately? What was the reason?

Furthermore such defaults doesn't need to be specified by the NodeConnectionManager directly. One could instead ask for these parameters, and have PolykeyAgent directly inject this during bootstrapping. If they were to have defaults, the default for keep alive interval time would be 1000, and the default for idle timeout would also be infinite.

I'm a fan of "terminal defaults" - either the minimum value, or the maximum value. That means either:

  1. A minimum value representing disabledness. That could just simply be undefined as in the case of keep alive interval time, if that simply disables keepalive mechanism.
  2. A maximum value representing no-timeout. That could be 0 or Infinity.

Such defaults ensure no assumptions on the end-user. Keep-alives becomes opt-in, and timeouts become opt-in. Applying this principle ensures that the default configuration from a library perspective is always the minimal behaviour which is disabling keepalives and disabling timeouts.

However applications are different and they should have tuned convention. This is because applications have contextual understanding, libraries don't.

So what does this mean for the config? I think:

  • Set keep alive defaults to undefined
  • Set timeout defaults to Infinity
  • Only use the config.defaultSystem at the application-bootstrapping level
  • We should be able to rely upon 2 timeouts: 1. node connection startup timeout, 2. node connection ttl timeout, in both cases... this shouldn't require the underlying transport, except a way to cancel things.
  • On the topic of hole punching, we should be symmetric to the dialling behaviour. I think that means making use of the node connection startup timeout, and interval time. Maybe interval time should also be exponential (2x each packet)? Again this should only be configured at the top level.

I'm ordering them from most higher level to lower level. So config is ordered: RPC, nodes, client transport, then agent transport.

Another thing is that if defaults are specified at the application level, then in each domain, their defaults are specified right now directly. That's not quite correct. They should just directly refer to the src/config.ts for their defaults to avoid default entropy.

@CMCDragonkai
Copy link
Member Author

I think there's also the problem of https://github.com/MatrixAI/MatrixAI-Graph/issues/43. We also have to decide on the structuring of the timeout especially if there is a fan-out behaviour in the async code paths. We should label them correctly and ensure that we have specified behaviour here.

@CMCDragonkai CMCDragonkai force-pushed the feature-config-defaults branch from 3064e24 to a2e0692 Compare August 18, 2023 13:33
@CMCDragonkai
Copy link
Member Author

Ok I did talk about this earlier about the issue with using maxIdleTimeout.

MatrixAI/js-quic#26 (comment)

Need to reconcile this.

@CMCDragonkai CMCDragonkai force-pushed the feature-config-defaults branch from a2e0692 to 3af5c05 Compare August 18, 2023 13:36
@tegefaulkes
Copy link
Contributor

Max idle timeout is a bit of a technical limitation of quic. It's the best way we can detect and handle a connection failure, so id prefer if it wasn't infinite. But for quic, it's used for the idle timeout and connection establishment timeout. so we can't really separate the two.

If we set the idle timeout to infinite, then we wouldn't have a nice way to detect if a connection has failed without re-implementing a bunch of logic for checking connection level idleness. At the time we decided it was better to let quic handle this.

As for the startup timeout, that needs to be less than the maxIdleTimeoutTime since idle time is essentially the timeout time for a connection in quic. We coded it so that if the start timeout was more than the idle timeout then the connection creation would throw.

In my opinion, the idle timeout QUIC provides is too useful not to use, but given how it works, if we use it we can't have the start timeout be longer or infinite.

I suppose we could default both to being being infinite and have anything using specify these parameters. But it seems like a bad idea to have any connections default to never timing out.

@CMCDragonkai
Copy link
Member Author

I'm going to be trying to abstract the agent transport parameters to the nodes abstraction level. To do this I need to review how the nodes system is working, since it appears that all of agent transport is embedded into the nodes system. The nodes domain basically is encapsulating all of the agent transport.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Aug 21, 2023

Here's the revised configuration for the system:

    rpcTimeoutTime
    rpcParserBufferSize
    clientConnectTimeoutTime
    clientKeepAliveTimeoutTime
    clientKeepAliveIntervalTime
    nodesConnectionConnectTimeoutTime
    nodesConnectionKeepAliveTimeoutTime
    nodesConnectionKeepAliveIntervalTime
    nodesConnectionHolePunchTimeoutTime
    nodesConnectionHolePunchIntervalTime
    nodesConnectionFindConcurrencyLimit
    nodesConnectionIdleTimeoutTime

All other parameters should be subsumed to this.

In order to achieve this both js-ws and js-quic should eventually converge to only using 3 parameters regarding time:

connectTimeoutTime
keepAliveTimeoutTime
keepAliveIntervalTime

The connectTimeoutTime in particular should actually be used on both connecting and accepting a connection.

Right now js-quic is lacking this, and instead relies on a single maxIdleTimeout to do both connect timeout and keep alive timeout. We should set it to 0 and instead make use of JS level timeouts, to separate the connecting timeout (for both client and server) from the idle timeout. However it may be a bit more complicated for the idle timeout, because we would have to hook into the send and recv calls... remember that our keep alive mechanism is done inside quiche. So instead, we can do something hybrid. We could continue using maxIdleTimeout, but set it to keepAliveTimeoutTime, and then enforce that connectTimeoutTime cannot be larger than keepAliveTimeoutTime (throw an exception if this is the case).

This is all due to the implementation of js-quic which relies on quiche and its internal timeout and keepalive mechanism.

Correspondingly it makes sense to throw an exception if keepAliveIntervalTime was larger than keepAliveTimeoutTime.

So for js-quic here are the config constraints:

connectTimeoutTime <= keepAliveTimeoutTime
keepAliveIntervalTime <= keepAliveTimeoutTime

As for js-ws, it only needs this constraint.

keepAliveIntervalTime <= keepAliveTimeoutTime

However if js-ws is limited also in its "timeout" mechanism, then also add in the first constraint.

@amydevs @tegefaulkes

@CMCDragonkai
Copy link
Member Author

Furthermore I'm removing some parameters that should be auto-derived based on https://github.com/MatrixAI/MatrixAI-Graph/issues/43.

Things like the node ping timeout time, should really be automatically derived.

@CMCDragonkai
Copy link
Member Author

Note that nodesConnectionIdleTimeoutTime is special as it represents the garbage collection trigger for the node connection when nothing in the program is using it. It's not the same as a timeout for inactivity on the connection (data-wise), it's more of a logical timeout, equivalent to a sort of time-based cache.

@CMCDragonkai
Copy link
Member Author

@tegefaulkes the constraint can be checked in config.ts as just a precheck. But you cannot check the constraint in QUICConnection.createQUICConnection because it's possible I may want to override with a larger timeout due to creating a quic connection within a larger operation.

@CMCDragonkai
Copy link
Member Author

So simply because maxIdleTimeout has to be continued to use, that is fundamentally the behaviour we will get, you can add a logger warning whenever the timeout is larger... but that might be too noisy.

Also if it turns out it's possible to actually mutate the config after you create the connection, that might actually solve the problem. I asked this but you need to experiment to see if it is true.

https://chat.openai.com/share/5a302ea7-6134-4020-84fc-015200de4aa2

@CMCDragonkai
Copy link
Member Author

Regarding interval time, their definitions all depend on underlying implementation. In some simple cases it's just a fixed interval time. In other cases it is more efficient as in QUIC.

I believe any implementation we do should default to the simple interval and let's not doing anything fancy yet.

So nodesConnectionHolePunchIntervalTime should just be a fixed interval, not be doubled each time.

@CMCDragonkai
Copy link
Member Author

Ok so inside the quiche library, it is actually copying/cloning a portion of the config into the connection struct.

It looks like this:

            local_transport_params: config.local_transport_params.clone(),

Located here: https://docs.quic.tech/src/quiche/lib.rs.html#1726

Anyway this means if we do this:

          clientQuicheConfig.setMaxIdleTimeout(10000);

          clientConn = quiche.Connection.connect(
            null,
            clientScid,
            clientHost,
            serverHost,
            clientQuicheConfig,
          );

          clientQuicheConfig.setMaxIdleTimeout(10000);

The first call works because it's done before the connection copies it. The second one doesn't. The call succeeds, but the mutation doesn't do anything because it doesn't affect the underlying property.

@CMCDragonkai
Copy link
Member Author

Furthermore the local_transport_params is a private property of quiche's Connection struct so it cannot be mutated.

@CMCDragonkai
Copy link
Member Author

The solution is:

  1. Create a minIdleTimeout parameter in js-quic as the minimum boundary corresponding to maxIdleTimeout.
  2. This is to be a static property exposed in src/config.ts in js-quic.
  3. Additionally our QUICConfig should expose a keepAliveIntervalTime.
  4. Default minIdleTimeout to Infinity and maxIdleTimeout to 0 and keepAliveIntevalTime to undefined.
  5. Use minIdleTimeout for the decorator in QUICClient.createQUICClient.
  6. Use minIdleTimeout for the QUICConnection.createQUICConnection inside QUICServer.connectionNew.
  7. Application-wise we always set a nodesConnnectionConnectTimeoutTime to be less or equal to nodesConnectionKeepAliveTimeoutTime.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Aug 22, 2023

@amydevs for js-ws, I'm creating these 3 parameters:

    /**
     * Timeout for the transport connecting to the client service.
     *
     * This bounds the amount of time that the client transport will wait to
     * establish a connection to the client service of a Polykey Agent.
     */
    clientConnectTimeoutTime: 15_000, // 15 seconds
    /**
     * Timeout for the keep alive of the transport connection to the client
     * service.
     *
     * It is reset upon sending or receiving any data on the client service
     * transport connection.
     *
     * This is the default for both sides (client and server) of the connection.
     *
     * This should always be greater than the connect timeout.
     */
    clientKeepAliveTimeoutTime: 30_000, // 30 seconds (3x of interval time)
    /**
     * Interval for the keep alive of the transport connection to the client
     * service.
     *
     * This is the minimum interval time because transport optimisations may
     * increase the effective interval time when a keep alive message is not
     * necessary, possibly due to other data being sent or received on the
     * connection.
     */
    clientKeepAliveIntervalTime: 10_000, // 10 seconds

Note that clientKeepAliveTimeoutTime should then be usable on client and server side, meaning the server side should also terminate connections that idle for too long. Can you make that possible in js-ws?

Remember that in js-quic we defaulted everything to infinity because it's a library. While in PK the application, we are controlling for user experience.

@CMCDragonkai
Copy link
Member Author

This PR can install the 0.0.19 QUIC implementation as it will have to reconfigure everything to use the new system defaults.

This is the current state now with all comments explaining their meaning.

  defaultsSystem: {
    /**
     * Timeout for each RPC stream.
     *
     * The semantics of this timeout changes depending on the context of how it
     * is used.
     *
     * It is reset upon sending or receiving any data on the stream. This is a
     * one-shot timer on unary calls. This repeats for every chunk of data on
     * streaming calls.
     *
     * This is the default for both client calls and server handlers. Both the
     * client callers and server handlers can optionally override this default.
     *
     * When the server handler receives a desired timeout from the client call,
     * the server handler will always choose the minimum of the timeouts between
     * the client call and server handler.
     *
     * With respect to client calls, this timeout bounds the time that the client
     * will wait for responses from the server, as well as the time to wait for
     * additional to be sent to the server.
     *
     * With respect to server handlers, this timeout bounds the time that the
     * server waits to send data back to the client, as well as the time to wait
     * for additional client data.
     *
     * Therefore it is expected that specific clients calls and server handlers
     * will override this timeout to cater to their specific circumstances.
     */
    rpcTimeoutTime: 15_000, // 15 seconds
    /**
     * Buffer size of the JSON RPC parser.
     *
     * This limits the largest parseable JSON message. Any JSON RPC message
     * greater than this byte size will be rejecte by closing the RPC stream
     * with an error.
     *
     * This has no effect on raw streams as raw streams do not use any parser.
     */
    rpcParserBufferSize: 64 * 1024, // 64 KiB
    /**
     * Timeout for the transport connecting to the client service.
     *
     * This bounds the amount of time that the client transport will wait to
     * establish a connection to the client service of a Polykey Agent.
     */
    clientServiceConnectTimeoutTime: 15_000, // 15 seconds
    /**
     * Timeout for the keep alive of the transport connection to the client
     * service.
     *
     * It is reset upon sending or receiving any data on the client service
     * transport connection.
     *
     * This is the default for both sides (client and server) of the connection.
     *
     * This should always be greater than the connect timeout.
     */
    clientServiceKeepAliveTimeoutTime: 30_000, // 30 seconds (3x of interval time)
    /**
     * Interval for the keep alive of the transport connection to the client
     * service.
     *
     * This is the minimum interval time because transport optimisations may
     * increase the effective interval time when a keep alive message is not
     * necessary, possibly due to other data being sent or received on the
     * connection.
     */
    clientServiceKeepAliveIntervalTime: 10_000, // 10 seconds
    /**
     * Concurrency pool limit when finding other nodes.
     *
     * This is the parallel constant in the kademlia algorithm. It controls
     * how many parallel connections when attempting to find a node across
     * the network.
     */
    nodesConnectionFindConcurrencyLimit: 3,
    /**
     * Timeout for idle node connections.
     *
     * A node connection is idle, if nothing is using the connection. A
     * connection is being used when its resource counter is above 0.
     *
     * The resource counter of node connections is incremented above 0
     * when a reference to the node connection is maintained, usually with
     * the bracketing pattern.
     *
     * This has nothing to do with the data being sent or received on the
     * connection. It's intended as a way of garbage collecting unused
     * connections.
     *
     * This should always be greater than the keep alive timeout.
     */
    nodesConnectionIdleTimeoutTime: 60_000, // 60 seconds
    /**
     * Timeout for establishing a node connection.
     *
     * This applies to both normal "forward" connections and "reverse"
     * connections started by hole punching. Reverse connections
     * is started by signalling requests that result in hole punching.
     *
     * This is the default for both client and server sides of the connection.
     *
     * Due to transport layer implementation peculiarities, this should never
     * be greater than the keep alive timeout.
     */
    nodesConnectionConnectTimeoutTime: 15_000, // 15 seconds
    /**
     * Timeout for the keep alive of the node connection.
     *
     * It is reset upon sending or receiving any data on the connection.
     *
     * This is the default for both sides (client and server) of the connection.
     *
     * This should always be greater than the connect timeout.
     */
    nodesConnectionKeepAliveTimeoutTime: 30_000, // 30 seconds (3x of interval time)
    /**
     * Interval for the keep alive of the node connection.
     *
     * This is the minimum interval time because transport optimisations may
     * increase the effective interval time when a keep alive message is not
     * necessary, possibly due to other data being sent or received on the
     * connection.
     */
    nodesConnectionKeepAliveIntervalTime: 10_000, // 10 seconds
    /**
     * Interval for hole punching reverse node connections.
     */
    nodesConnectionHolePunchIntervalTime: 1_000, // 1 second
  },
  /**
   * Default user configuration.
   * These are meant to be changed by the user.
   * However the defaults here provide the average user experience.
   */
  defaultsUser: {
    nodePath: getDefaultNodePath(),
    rootCertDuration: 31536000,
    /**
     * If using dual stack `::`, then this forces only IPv6 bindings.
     */
    ipv6Only: false,
    /**
     * Agent host defaults to `::` dual stack.
     * This is because the agent service is supposed to be public.
     */
    agentServiceHost: '::',
    agentServicePort: 0,
    /**
     * Client host defaults to `localhost`.
     * This will depend on the OS configuration.
     * Usually it will be IPv4 `127.0.0.1` or IPv6 `::1`.
     * This is because the client service is private most of the time.
     */
    clientServiceHost: 'localhost',
    clientServicePort: 0,
  },

@tegefaulkes tegefaulkes force-pushed the feature-config-defaults branch from debf777 to 44609b9 Compare October 4, 2023 04:44
@tegefaulkes
Copy link
Contributor

Ok, so I've fixed up the history and pushed that up. #560 has been rebased on the new history as well. I'm going to do a final check and then merge this.

@tegefaulkes
Copy link
Contributor

I'm deferring fixing tests to a new PR, we need a pre-release of this sooner than later.

Off the top of my head, the following needs to be addressed still.

  1. everything that depends on RPCServerAgent and NCM need to be updated since the RPCServer was moved into NCM
  2. all tests need to be checked if they're working. I haven't checked everything yet so there may be breaks in other domains.
  3. some NCM and nodeManager tests are still failing.
  4. There are some TODOs and FIXMEs that need to be resolved. These are pretty minor things and spot checks that need to be removed.

@CMCDragonkai
Copy link
Member Author

The 1. still implies code in src?

Well I guess you could work on that independent of the WS and RPC. But #560 will need to do some testing to make sure that's working for Polykey-CLI.

@tegefaulkes
Copy link
Contributor

I updated anything in src for 1., that just affects tests depending on it.

@tegefaulkes tegefaulkes merged commit 25bba9d into staging Oct 4, 2023
@CMCDragonkai
Copy link
Member Author

Was task 5. in the OP done too?

A note about task 5. It's important to understand that in PolykeyAgent, we have a bunch of dependencies that can be injected, but their injection is really intended for mocking, which is only for testing, not intended for production usage. This is different from optional dependencies which has legitimate usecases when injecting or not injecting. If we were to follow task 5. strictly, this would add alot of boilerplate to managing optional injected dependencies for everything in PolykeyAgent. And right now, there's no easy way to manage this without manually setting up booleans and conditional checks for all these optional dependencies. Therefore... task 5. should only be applied to situations where injecting or not-injecting is a legitimate usecase such as between QUICSocket and QUICServer and QUICClient and between EncryptedFS and DB. But optional dependencies in PolykeyAgent are the exception. IN FACT, because JS allows module mocking... we may even remove the ability to inject these dependencies and use jest.mock instead.

No link to an issue if there was even one.

@tegefaulkes
Copy link
Contributor

No issue was made, it was just changed directly in this PR.

@CMCDragonkai
Copy link
Member Author

Ok

@tegefaulkes tegefaulkes mentioned this pull request Oct 10, 2023
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants