Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Egress, authorized by UCAN #36

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Egress, authorized by UCAN #36

wants to merge 3 commits into from

Conversation

Peeja
Copy link
Member

@Peeja Peeja commented Oct 7, 2024

📚 Preview

A proposal for managing billable egress using UCAN, distilled from a conversation between @hannahhoward, @travis, and myself (@Peeja) on 10/04/24. Builds on and evolves prior egress proposals.

Open questions and missing bits are currently in the document, at the bottom.

@Peeja Peeja force-pushed the egress-with-ucan branch from e8da73f to f839d13 Compare October 7, 2024 16:27
Comment on lines 129 to 136
1. Look up the Location Commitment for the given CID. If not found, respond with 404 Not Found.
2. If there is no token given, serve the request through the standard rate-limiter. (This behavior is likely to change in the future.)
3. If there is a token given (eg, `abcde12345`), build its corresponding DID (eg, `did:bearer:abcde12345`).
4. Look up all delegations with the token DID as audience.
5. Attempt to prove the ability to invoke `/space/content/retrieve` on the Space listed in Location Commitment, with the given CID. If more than one Location Commitment is found, attempt each in turn: a CID may be stored in multiple Spaces, and the token may be able to retrieve the content through one and not another.
6. If no such proof chain is available, respond with 401 Unauthorized. [or 404 Not Found?]
7. If a proof chain is found, execute the invocation on the Executor.
8. Using the information in the receipt, fetch the content and proxy it as the response.
Copy link
Member

@fforbeck fforbeck Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: How will the customer ID be identified to log the egress event in the Accounting Service? At which step in this process will the customer ID be retrieved and associated with the request for accurate billing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The billing is tied to the Space, so once we have the Location Commitment (which specifies the Space), we can bill the right customer. That probably deserves a callout in here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks. Yes, it would be great to have that stated in the RFC as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're missing a step here! we need to look up the "provider" registered with the space, which is a looking in a dynamo table that Freeway does not have access to - I think this needs to be handled either in w3up or w3infra

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for flagging that @travis . I will update the w3infra to execute that query and find the provider.

Copy link
Member

@travis travis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a HUGE fan of this design, really nice, very excited to get this into prod!

}
```

The delegation must be available to the Executor at invocation-time. Since the Invoker will be using a token and not speaking UCAN, they will not be able to deliver the proof, so the Executor must have access to it in a store. The Client should therefore invoke `access/delegate` (UCAN 1.0 equivalent TBD) to store the delegation with Storacha.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: love that we can reuse access/delegate here! how we actually stored these delegations was a question that was making me un-comfy and this is a very elegant solution.

rfc/egress-with-ucan.md Show resolved Hide resolved

The Space may then delegate this to another Principal to give them authority to access the Space's content. Typically, this will not be done directly (though it may), but indirectly through an Account and an Agent: the Space will delegate all of its capability to an Account, which will delegate all of *its* capability to an Agent when it logs in. Then the Agent (ie, the logged-in customer) can share access to the content as they see fit.

## A new DID method: `did:bearer`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: this is so interesting - on one hand, the DID spec says that "the design [of DIDs] enables the controller of a DID to prove control over it" and it feels a little weird to make a DID where this proof is not possible, but that's sort of the idea with bearer token auth in CDNs! nobody can really "prove" control over a bearer token, and that's ok because it's a fairly lightweight form of "security" that can be easily "broken" but in practice is not because the payoff isn't very big. I definitely balked a little at this but upon further reflection I kind of think it's genius?


The Gateway will offer an HTTP endpoint. Currently, the Storacha Gateway's endpoint takes the form of `https://<cid>.ipfs.w3s.link/`. The Gateway will accept a token as part of the URL. To serve the request, the Gateway will perform the following steps:

1. Look up the Location Commitment for the given CID. If not found, respond with 404 Not Found.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: can we also look up delegations to the did:bearer in this step? I'm a little concerned that adding another network request will slow down reads too much, but I'm not totally clear on whether location commitments and delegations to bearer tokens are even queryable in a single network request, so maybe this is moot


## [To Come]

* Rather than serve non-token content rate-limited by default, require a delegation of `/space/content/retrieve` to some DID representing "anyone".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: is did:bearer: (ie, empty-string bearer token) a valid DID? maybe that or did:bearer:* would make sense, but implying "glob" semantics is maybe a slippery slope? then again I guess we are free to interpret did:bearer:* however we want so maybe this is not a big deal...

## [To Come]

* Rather than serve non-token content rate-limited by default, require a delegation of `/space/content/retrieve` to some DID representing "anyone".
* Bitswap should execute a `/space/content/retrieve` as well to respond to requests. Bitswap should be authorized by delegating to some DID representing Bitswap/Hoverboard.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: love this - delegating to a hoverboard DID feels allllmost analogous to "pinning" in IPFS if you squint at it - ie, it's our system saying that we'll make a piece of content available via bitswap to the IPFS network - feels like the right vibe of "we're a storage system compatible with IPFS"

## Open questions

* What does the `/space/content/retrieve` receipt look like?
* Can you enumerate the contents of a space? Is that in scope here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: we do have upload/list and space/blob/list for this - I don't think we need to worry about it here as the gateway doesn't expose any way to do this


* What does the `/space/content/retrieve` receipt look like?
* Can you enumerate the contents of a space? Is that in scope here?
* What does a Gateway URLs with a token look like?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I'd probably go with something like https://<cid>.ipfs.w3s.link?auth=<token> - note that the current implementation actually looks in the Authorization header and will need to be extended to look in the query as well - imho both are important for different use-cases but the query is probably most important for our current efforts

* What does the `/space/content/retrieve` receipt look like?
* Can you enumerate the contents of a space? Is that in scope here?
* What does a Gateway URLs with a token look like?
* What can be cached? This seems relatively cacheable, but we should be explicit in the design to make sure we're on a suitable path. This process needs to be *fast*, at least once the cache is warm.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: once we've validated a token I think we can store something in KV or a similarly performant caching layer, probably with an expiry, that says it's valid. we might need a mechanism to invalidate this cache, but that can probably wait until after the initial implementation?

> * **Subject:** The Space from which content will be retrieved.
> * **Arguments *(0.9: `nb`)*:**
> * `cid`: The CID of the content which will be retrieved.
> * **Receipt:** [TBD, but must provide instructions to access the data (using HTTP?) without further UCAN authorization.]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the user set the bearer token as caveats to this delegation?

"with": "did:key:zSpace",
"can": "space/content/retrieve",
"nb": {
"cid": "bafy...7pcu"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps name it "root" to differentiate this as a DAG root CID, also to be consistent with our existing upload/add invocation.

Also, this will actually be a link not a string, so, assuming dag-json encoding we should specify as:

Suggested change
"cid": "bafy...7pcu"
"root": { "/": "bafy...7pcu" }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we do want that to be a link, actually. We want to request the content using the CID as an argument; this would provide the content itself (by reference) as the argument. Importantly, I believe, policies would resolve against the resolved bafy...7pcu, not the string "bafy...7pcu". We want to match on the string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no obligation to include the content in the invocation - we don't for upload/add for example.

I'm almost certain you can match a link in a policy...

IDK maybe I'm not understanding right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec doesn't appear to say explicitly, but it does say:

Selecting on Bytes

Bytes MAY be selected into. When doing so, they MUST be treated as a byte array ([u8]), and MUST NOT be treated as a Base64 string or any other representation.

// DAG-JSON
{ "/": { "bytes": "1qnBjPjE" } }

// Hexadecimal
0xd6 0xa9 0xc1 0x8c 0xf8 0xc4

// Selector
".[3]"
// ⬆️  0x8c = 140

If the policy resolver understands DAG-JSON bytes, I assume it understands links as well. I'm not sure what the correct behavior would be when matching on a link, but I don't think it would be to treat it as a string, or as a literal JSON map { "/": "bafy...foo" }. My assumption would be that, if anything, it would attempt to inline the link before applying the policy, and perhaps bail if it didn't have access to the content.


The Space may then delegate this to another Principal to give them authority to access the Space's content. Typically, this will not be done directly (though it may), but indirectly through an Account and an Agent: the Space will delegate all of its capability to an Account, which will delegate all of *its* capability to an Agent when it logs in. Then the Agent (ie, the logged-in customer) can share access to the content as they see fit.

## A new DID method: `did:bearer`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this idea, but that said, I'm not super keen on inventing another new DID method...

To play devils advocate, why is this better than delegating, say space/content/serve to the gateway DID with the bearer token that allows retrieval in the caveats?

One potential thing I can think of is when it comes to proving you served the data - without a specific delegation to the entity that claims to have served some data, there's scope for fraud if the entity can somehow get hold of the delegation to did:bearer, because it could then claim it served a billion petabytes of data (for example). It gives these tokens real value, and means that anyone holding them is perhaps more of an attack target.

Another potential issue I think is by delegating to did:bearer you're effectively allowing access on all/any gateways. I wonder if there may be a need to restrict to specific gateways in the future?

Delegating to the gateway makes it easier to expand this to bitswap for example - you could delegate to the DID of our bitswap peer. You can also easily delegate to just one, or both, or neither.

Delegating to did:bearer does not allow re-delegation. I'm not sure if that's necessary/desired, but I imagine you might want to re-delegate space/content/serve.

}
```

The delegation must be available to the Executor at invocation-time. Since the Invoker will be using a token and not speaking UCAN, they will not be able to deliver the proof, so the Executor must have access to it in a store. The Client should therefore invoke `access/delegate` (UCAN 1.0 equivalent TBD) to store the delegation with Storacha.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something makes me uneasy that the gateway is using UCAN delegations for which it is not the audience.


To make content available as on a traditional CDN, an Agent acts on authority of a Space to make that Space's contents, or some specific CID in it, available using a bearer token, an opaque, unguessable string. The Gateway then responds to requests which contain a token by validating the proof chain, finding the content, charging for egress, and proxying it to the requester.

This process does *not* include making and tracking Location Commitments. A Location Commitment is an attestation by a Storage Node that it holds a particular piece of content on behalf of a particular Space, and that it can provide it. In this proposal, we assume such a system already exists.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One interesting thing to consider is that if the location commitment were to be cited in a space/content/retrieve delegation, in the happy path (where the content has not been moved), the gateway would already know where to fetch the content from, simply by reading the delegation that authorizes retrieval.

Probably not worthwhile but just thought I'd write it down :)


## [To Come]

* Rather than serve non-token content rate-limited by default, require a delegation of `/space/content/retrieve` to some DID representing "anyone".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I guess did:*/null audience?

Alternatively, with a space/content/serve delegation you might delegate to the gateway and specify an empty bearer token.

@Peeja
Copy link
Member Author

Peeja commented Oct 8, 2024

did:bearer, or not?

Reasons not to delegate to the token:

  • Anyone can use the delegation without going through a trusted Gateway, which could allow a third party to issue invocations just to attack our customer with egress charges.
  • The Gateway ends up invoking on someone else's authority, which is weird.
  • It requires a new DID method, which is not to be undertaken lightly.
  • It leaves an open question: how do you delegate to no-token? Is it did:bearer:? A whole new DID method? (Please no…)

Reasons to delegate to the token:

  • A single delegation works for any Gateway in the network. That may not actually be a good thing, though.
  • We'd like to leave open the possibility of delegating to any DID, such as delegating to some Agent who could then issue their own Invocations to retrieve content. Delegating to the token is parallel to that. But then, maybe an entity which can issue Invocations of their own doesn't need to use a Gateway at all. Then what happens to egress, though…?
  • Our only index on the delegation store right now is by audience. That makes it really easy to look up delegations by token, when the token is the audience.

Given all that, I'm swayed by @alanshaw's argument: the most "correct" thing here is for the Gateway to be the audience of the delegation and the issuer of invocation, and to drop did:bearer: altogether. You'll have to delegate to each Gateway you want to use, but that's probably a good thing. We don't need a non-Gateway egress solution yet, so we can solve that later. And indexing these by something other than audience is an annoying change, but not insurmountable.

Some other things from the feedback

  • I like @travis's suggestion to include the origin.
  • @alanshaw makes a good point that it's the root CID we care about, not the individual file's CID. In fact, the Gateway takes not just a (root) CID, but also a path. We should include that in the args.

Proposed changes

  • Remove the did:bearer: method.
  • The Agent delegates /space/content/retrieve to the Gateway, not the token.
  • The args are now:
    • Root CID [string]
    • Path [string? array of strings, so we can match on segments?]
    • Token [string]
    • Origin (or Referer? or both, as separate args?)
  • The HTTP Gateway's sole job (on behalf of that role) is to translate HTTP requests into UCAN invocations, which it issues in its own name.
  • The Retrieval Service (which is likely tied to the HTTP Gateway, but is a separate role) accepts and executes the invocation, and could receive invocations directly from clients.

Open questions

  • What is "egress" exactly? Is it Gateway → Retriever? Or is it Retrieval Service → Retriever? Or is it Storage Node → Retriever? Most saliently, if a client retrieved from the Retrieval Service directly by issuing an invocation, would the Retrieval Service need to charge for egress? Presumably yes. Does the Gateway need to charge extra for egress through it? What if you go straight to the Storage Node? What are we (who?) actually charging for here?

  • [Probably moot for the moment; see below.] How do we handle path references? Consider the following Space:

    <did:key:zSpace>
    ┖ [bafy...dir]
      ┠ file1: [bafy...file1]
      ┖ file2: [bafy...file2]
    

    We can access file1 as either bafy...dir/file1 or as bafy...file1. Should a single delegation be able to cover both? In other words, given a request for https://bafy...dir.ipfs.w3s.link/file1 should the Gateway resolve the reference down to bafy...file1 and invoke /space/content/retrieve to ask for that, or should it ask for something like { "root": "bafy...dir", "path": ["file1"] } and let the Retrieval Service (the Executor of the invocation) resolve it? TBH, I'm not completely clear on where that resolution happens today, and where it will happen after this indexer work.

    • It's much more clear cut for the Gateway to simply pass along the path, as this makes it literally a translator from HTTP requests to UCAN invocations. We probably also don't want it to know enough to be able to do the resolution.
    • But two ways to reference the same content means two different possible sets of args to match to a policy. I suppose we could try to cram all the different paths to the file into the policy as "or"s, but that seems pretty darn messy. Or we could say that a delegation only authorizes a single reference form to be used on the Gateway, but that seems counter-intuitive for the customer.
    • Underlying all of this is the fact that most of the time the policy won't care about the content at all: it'll just apply to an entire Space. So I don't want to spend too much effort on it. But it smells like something that might point to an important architectural decision.

@Peeja
Copy link
Member Author

Peeja commented Oct 8, 2024

More on the last point: per @hannahhoward & @prodalex, we'll only support permissions by entire Space for the first go. That should mean we can skip over a lot of these questions for now. We're actually going to be doing this in UCAN 0.9, not 1.0, so we don't have proper policies, just the nb. We can include the root & path in the invocation, but not make them a thing you can attenuate by in a delegation capability at all.

Copy link
Member

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to leave a general comment before digging into specifics, that vageuly has to do with @alanshaw 's question about the DID.

So, I want to state my own thinking that there are two distinct and separate protocols for a retrieval:

  1. The raw read from a source location (simple HTTP GET with range request)
  2. The gateway query across potentially one or more providers to assemble a user level response, likely deserialized to a flat file. The second protocol uses the first protocol + the indexing service to accomplish its work.

Furthermore, the first protocol is close to a typical object storage interface, while the second, especially when run on a platform like Cloudflare, is much closer to a CDN (or a CDN + edge compute). For the remainder of this I'll use "Object storage layer" to refer to the first and "CDN layer" to refer to second.

There are some other important distinctions -- the object storage layer protocol in concert with the indexing service can be run without a trust relationship between the retriever and the storage node, while the CDN layer requires a trust relationship between the end user and the gateway, unless you're specifically staying in the bounds of the trustless portion of the Gateway protocol spec. (once you serialize to a flat file, you're now trusting the gateway)

Up till now, the object storage protocol hasn't even really existed a separate protocol cause we're just reading our own databases and our own storage devices, but going forward it gets increasingly public. With the indexing service, in combination with the PDP storage nodes, it becomes a distinct layer (the november version will not be fully there but it will get there soone).

I apologize not for making this super clear as part of my own thinking. I see these also as different economic units of billing for of egress eventually -- the user pays x for a gateway retrieval, of which y goes to the gateway, and z goes to the storage node, where y + z = x. None of this needs to be figured out right this second cause we're running everything and our PDP nodes will probably just have an indirect accounting mechanism, but in a final product, the storage node would never get 100% of the user egress fee because we incur a sizable cost for running the cloudflare gateway that assembles the request, caches it, and services it super quick to the user.

When we talk about storage providers serving directly to end users, I generally don't see them running a gateway. Rather, I see a user with a native or server app who doesn't prioritize TTFB choosing to save money by running the freeway software directly, and only using the indexing service + object storage layer to get data.

Another use case could be someone building a custom retrieval gateway on top of the indexing service and object storage layer. Perhaps a product builder wants to build a query interface for archived blockchain data, and sell that independently. They would be storing on storacha, but charging their users to use their retrieval service that could build more complex IPLD queries against a blockchain than is available with the gateway protocol. (or perhaps just mirror the establish RPC api for querying for their chosen blockchain).

Anyway, coming back to this PR, I want to understand if space/content/retrieve refers the CDN layer of retrieval, or the object storage layer. A couple things that make this a bit confusing:

  1. There is no invoker for the CDN layer of the retrieval (i.e. it's not a proper UCAN request), at which the gateway is the executor.
  2. There's no proper executor for the storage layer of the of the retrieval for now, until the storage nodes do UCAN auth.

The RFC simply says the gateway is the invoker and the executor and doesn't make clear the part of the retrieval we're referring to.

I think ultimately space/content/retrieve makes sense as the CDN layer, and it's a weird one cause the invoker and executor are ALWAYS the gateway. That means the space delegates to the gateway in order for it to execute, and the token goes in the policy (caveats for now).

So long and short I agree with @alanshaw -- no did:bearer, because it's not a real entity. If we build a gateway that is called with a real invocations, then it makes sense that the issuer becomes a real, verifiable entity with tracable delegation to the space, the gateway is just an executor. There are certainly use cases for that, but not worth worrying about for then.

What about the storage layer retrieval in the future? So my suggestion is to actually make space/index/query and space/content/retrieve/blob for that, eventually. So when the CDN receives a request, it invokes space/content/retrieve with itself as the issue + executor, using delegations it has stored. And it generates a receipt that looks something like:

{
    cmd: "/ucan/assert",
    args: {
       about: "bafy...cid" // space/content/retrieve invocation
       facts: {
          out: {
             ok: {
                statusCode: 200,
                bytesServed: 1,000,000
             }
          },
         run: [
           {
              cmd: "space/index/query"
           },
           {
              cmd: "space/content/retrieve/blob"
           },
           {
              cmd: "space/content/retrieve/blob"
           }
       ]
    }
  }
}

This wouldn't be sent back to the retriever but this provides nice instructions on how to bill the original user, that can be checked against the storage node and the indexer submitting receipts of their own for their parts of retrieval (along with the invocation that proves a retrieval/index was actually requested by the gateway). It all vaguely works for a trustless billing system :) (where even the indexing service could get paid)

That's my take.

But yea, I'm a block on not doing did:bearer after further consideration. Is there any reason we can't just throw it in caveats? We don't have policies but we have ucanto that can enforce this no?

Copy link
Member

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previous review ultimate requests a change to remove did:bearer and put it in caveats.

Copy link
Member

@travis travis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok cool - after reading through all this I'm switching to "request changes" - sounds like we're all on the same page anyway!

@alanshaw
Copy link
Member

alanshaw commented Oct 9, 2024

Just for clarity, in my comments above I suggested delegating space/content/serve (note "serve" not "retrieve") to authorize the gateway to serve data from the space with caveats appropriate to the gateway (origin, path etc.).

I propose space/content/retrieve/* to exist for the purpose of retrieving byte ranges. It could either be derived from space/content/serve or delegated explicitly.

Example of capability derivation in ucanto. 😱 I don't recall this being in the UCAN spec so perhaps explicit delegation is what we will do...

Separating the capabilities like this allows us to refer separately to the 2 protocols @hannahhoward called out above.

@Peeja on origin/path etc. I'd omit these unless we're ready to implement them.

@travis
Copy link
Member

travis commented Oct 9, 2024

agree re: origin! just want to make sure there's space in the interim protocol if we do decide to go that route, but I'm honestly hoping we can upgrade to UCAN 1.0 before we need that, where it will be a fairly easy and natural extension thanks to pol

@Peeja
Copy link
Member Author

Peeja commented Oct 9, 2024

@hannahhoward 💡❗

  1. The raw read from a source location (simple HTTP GET with range request)
  2. The gateway query across potentially one or more providers to assemble a user level response, likely deserialized to a flat file. The second protocol uses the first protocol + the indexing service to accomplish its work.

I hadn't considered that the Gateway would be composing multiple pieces of content across potentially multiple providers to assemble a single response. Given that, I agree wholeheartedly with @alanshaw: that command should be [/]space/content/serve. That would be defined as "serving a request", which involves making the underlying requests and assembling the response.

Question: What level does Bitswap operate on? Does it serve requests, or fetch blobs?

Question: What are the proper names for these things? Is a "Blob" specifically the same as a "Shard"? (This is non-obvious to someone with less context, as typically a "blob" is simply any set of bytes, and in Git in particular it generally means roughly "a file's contents, separate from any filename that might point to it".) What is the name for the thing that the CDN Layer (Gateway) serves, and what is the name for the thing that the Object Layer serves?

@hannahhoward
Copy link
Member

  1. Bitswap will be /space/content/serve -- it's just another way to serve content, and it similarly can pull from multiple places

  2. Blob is the right name and we should be careful about use of shard maybe. What is interesting is a blob is a blob of bytes, but it is also a shard of a larger dag when you are uploading large data. @alanshaw thoughts?

@alanshaw
Copy link
Member

alanshaw commented Oct 9, 2024

Blob is the right name and we should be careful about use of shard maybe. What is interesting is a blob is a blob of bytes, but it is also a shard of a larger dag when you are uploading large data. @alanshaw thoughts?

Yeah that's exactly right.

@Peeja Peeja force-pushed the egress-with-ucan branch 2 times, most recently from a672e64 to a1981ef Compare October 11, 2024 16:58
@Peeja
Copy link
Member Author

Peeja commented Oct 11, 2024

Version 2 is ready for review. Open questions at the end of the doc, and also copied here:

Open questions

  • When auth fails, should we return 401 Unauthorized, or 404 Not Found to mask that the data exists?
  • Should token be gatewayToken? Or authToken? Is token too broad a term to be in the args of that invocation?
  • What if we find a Location Commitment, but it has no Space on it (since it's optional)?

Copy link
Member

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after method rename.

removing request changes to expidite the process when done

rfc/egress-with-ucan.md Outdated Show resolved Hide resolved
rfc/egress-with-ucan.md Outdated Show resolved Hide resolved
rfc/egress-with-ucan.md Outdated Show resolved Hide resolved
2. If none are found, respond with `404 Not Found`.
3. Note if any Location Commitments were found with no Spaces. (If so, these are from before these changes, and mean we should fall back to the previous behavior later.)
4. Get the set of unique Spaces from those Location Commitments. (There will usually be one Location Commitment, with one Space.)
5. Repeat with each Space in any order, stopping if a successful response is produced:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to potentially be slow? Is it a separate network request for each space we need to check? The performance is the only part of this spec I have concerns about, but I might be misunderstanding - has been a week or so since my head's been here so apologies if I'm missing something!

3. Note if any Location Commitments were found with no Spaces. (If so, these are from before these changes, and mean we should fall back to the previous behavior later.)
4. Get the set of unique Spaces from those Location Commitments. (There will usually be one Location Commitment, with one Space.)
5. Repeat with each Space in any order, stopping if a successful response is produced:
1. Look up delegations in the store where the audience is `did:web:w3s.link` and the subject is the Space.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess relatedly - is "the store" here the indexing service or something else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants