-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Federation #19
Comments
I don't have time to work on this right now, but here's an old thread from a similar initiative I worked on depjs/dep#8 |
Thanks a lot @maxogden, will check that out. |
Wonder how far we can get with cloudflare + cloud storage. Will be experimenting with this in the coming weeks and will report back :-) |
@retrohacker thanks, appreciate it, ping here once you have some results to share :) That said, I do think that even if we find the fastest CDN, we can make it faster for people by having a federated model. But CDN in front of the metadata registry would still be a good idea. |
I've updated the initial issue here with an updated version of the proposed federation, will also bump it on the roadmap. Old proposal can be found here: https://gist.github.com/victorb/82ace9e6fe7adf578833527b8b94f914 |
Build it using Holochain, it's exactly what you need for distributed (fully sharded) storage and cryptographic security |
@victorb as promised, circling back to report on cost. Self hosting on cloud providers turned out to be reasonable. Our GCP mirror ended up costing ~$300 to do the initial mirroring (pulling 5TB of data through cloud functions and into cloud storage). Once the files are sitting in storage (multi-zone within the US), the cost is ~$6 a day. That includes the instance that is sitting there watching the CouchDB stream from the npm registry to keep the mirror fresh. The breakdown is $3.46 per day for storage and $2.28 for the compute instance. Cloudflare functions (where we are doing our load balancing) costs $0.50 per million invocations. BTW the service is up and running if you want to give it a try: https://freajs.com |
I opened a preliminary PR (#10) for Federation but probably best to go via a issue first, to better enable discussions around it. Here is what I've been thinking so far.
Old proposal: https://gist.github.com/victorb/82ace9e6fe7adf578833527b8b94f914
New proposal:
Open-Registry Federation
Summary
Open-Registry as a crowdfunded registry won't be able to reach the same scale
of npm inc registry without raising significant amount of funds. What we can do
however, is setup a federation of registries which would significantly lower our
operating costs and also give the users the benefit of faster performance and
local resource sharing.
The model of federation proposed here will decentralize the storage and
transfer of tarballs first, as it poses an easier way of getting started with
federation for Open-Registry.
Once implemented and used, we can start focusing on research about federated
publishing as well.
Motivation
Constraints
url (DNS/HTTP federation)
Use Cases
Security
aren't honest anymore
clients (npm + yarn) checks the checksums before extracting, so mutating
served tarballs is hard without client detecting it
Practical steps
Ok, so the working plan is the following:
find other nodes via those bootstrap nodes
packages
This is the small, MVP version to ensure the idea is viable in the wild.
First step towards federation is having the metadata index centralized with
Open-Registry while tarballs can be served from anywhere and anyone.
Plan is to use ipfs-lite by @hsanjuan to start a embedded libp2p node that will
expose the traditional registry interface as HTTP endpoints.
The software will connect to the central registry to find out the latest root
hash and also listen for any changes, automatically update it's local pointer
when Open-Registry's pointer changes.
The root hash can be found in multiple different ways, depending on the
environment of the software.
The software will basically be a resolver for (packageName, packageVersion) =>
IPFS hash via it's local proxy.
CLI interface
Example usage:
Pointing your package manager to
http://localhost:6736
should now allowyou to download and install packages on-demand, while caching them and serving
it to other users who are trying to download them too.
Federation Protocol
When the federation software gets started on the users device, it connects to
the main registry.
Once connection has been established, it asks for the latest version of the
registry (just a pointer), and saves it for future use.
Concurrently, it starts a HTTP server locally.
Now the user can point it's client to the local HTTP server
Requests will be proxied via the latest root hash the federation software knows
about, and cache fetched data
When the root hash of the main registry changes, it publishes it via the
following ways:
hash
in response to a GET request to npm.open-registry.devnpm.open-registry.dev
on the used libp2pnetwork
If the local client makes a request for a package that doesn't exists in the
local root hash, the client needs to make a request to the central registry to
download the package. After this is done, the package will be included in the
new root hash, and can therefore be downloaded by the local client without any
requests to the central registry.
Simulator
First step of the federation setup is creating a suitable testing environment
where we can run tests about how well the federation is working.
Simulator should start with running the following scenarios:
for one project. Run two times and ensure second is faster than first
Download packages without being connected to the Internet in the second one.
Ensure second node is faster than first node as connection should now be
local.
project. Compare to starting five nodes where only one is connected to the
internet. Second phase should be less bandwidth intense as packages are
only downloaded from the Internet once instead of for each node.
More elaborate schemes can be created in the future.
Bootstrap nodes
Open-Registry will run a couple of bootstrap nodes. These are responsible for
being accessible to the federation nodes and provide the data for metadata and
tarballs if the federation nodes doesn't have it locally.
Metrics
Both the bootstrap nodes and the main registry index should publish metrics in
the Prometheus format to be collected by the metrics gatherer. These metrics
will eventually be made accessible via a public dashboard.
For the federation nodes, we can offer opt-in metrics in the future, so we can
see the health of the federation.
Existing infrastructure migration
The current Open-Registry is just one instance which is the main Open-Registry
index. With federation, the architecture would change to add another component
which would be the federated instances. We have more flexibility on where to
place these but are in no rush to add them currently.
Potential Issues
Drawbacks
wanting to take advantage of it
be centralized in that case and require internet connectivity
Alternatives
registry for both tarball and metadata
Unresolved Problems
other nodes
pretty much sure to be faster, would be interesting to see how much)
Future
done on how metadata can be federated as well
namespace)
class-is
directly in thepackage.json
andlockfiles
/registry.npmjs.org/class-is
instead. More verbose, but more accurate andflexible
The text was updated successfully, but these errors were encountered: