Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] MSC3898: Native Matrix VoIP signalling for cascaded foci (SFUs, MCUs...) #3898

Draft
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

SimonBrandner
Copy link
Contributor

@SimonBrandner SimonBrandner commented Sep 25, 2022

Builds on/split out of #3401

Rendered

@SimonBrandner SimonBrandner added voip proposal A matrix spec change proposal kind:core MSC which is critical to the protocol's success needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. labels Sep 25, 2022
@SimonBrandner SimonBrandner changed the title [WIP] MSC0000: Native Matrix VoIP signalling for cascaded SFUs [WIP] MSC3898: Native Matrix VoIP signalling for cascaded SFUs Sep 25, 2022
Signed-off-by: Šimon Brandner <[email protected]>
@ara4n
Copy link
Member

ara4n commented Sep 25, 2022

i'm all in favour of splitting out the SFU bits from MSC3401, and leaving MSC3401 to focus on full mesh - but perhaps don't duplicate stuff between the two (e.g. the architecture diagrams, or m.call.* events?)

@SimonBrandner
Copy link
Contributor Author

SimonBrandner commented Sep 25, 2022

i'm all in favour of splitting out the SFU bits from MSC3401, and leaving MSC3401 to focus on full mesh - but perhaps don't duplicate stuff between the two (e.g. the architecture diagrams, or m.call.* events?)

I've duplicated the diagrams here mainly for context and the m.call. events are not duplicates, they just describe what needs to be added for SFUs to work (m.foci) (the whole event is included only as an example)

Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
proposals/3898-sfu.md Show resolved Hide resolved
proposals/3898-sfu.md Outdated Show resolved Hide resolved
proposals/3898-sfu.md Show resolved Hide resolved
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
proposals/3898-sfu.md Outdated Show resolved Hide resolved
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Copy link

@daniel-abramov daniel-abramov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Another issue that I ran into while refactoring the SFU was that I noticed that we don't have any feedback regarding messages sent to (and from) the SFU failing. This made things hard to debug since the client never knows if the message sent to the SFU was processed and if the result was successful. I.e. if you try to subscribe to a track and you don't get any tracks back, you don't really know if you need to retry and if so, for how long. We probably need to introduce a mechanism to report errors from/to the SFU. And because of the async nature of the communication, each such message will have to have some sort of a transaction ID or something like this, so that if the client sends Subscribe, Subscribe, Subscribe and gets an error back, it knows which Subscribe command the error corresponds to.
  2. Also, we need to probably mention somewhere how to deal with the "obsolete" To-Device message. When testing the SFU I've noticed that sometimes there were cases when I ended the SFU, but clients continued to post messages to the SFU (not knowing that the SFU is down). That resulted in lots of messages accumulating somewhere in the home server. Once the SFU got restarted, it got lots of messages back (which triggered many actions that must not have happened). We need to somehow receive and drop all old messages on the SFU side once we start it, i.e. send a request to the home server with something like "Hey, give me the messages since time.Now() and drop the rest" (conceptually; we probably can achieve this using the current API of the client SDK).

proposals/3898-sfu.md Outdated Show resolved Hide resolved
proposals/3898-sfu.md Outdated Show resolved Hide resolved
Comment on lines 221 to 222
"stream_id": "streamId1",
"track_id": "trackId1",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, do we really need both track ID and stream ID for the SFU use case?

The track IDs that browsers generate seem to be GUIDs that are unique enough (i.e. it's unlikely that there would be 2 tracks with the same GUID). Does this mean that instead of using two values, we could just send the track_id? (the server anyway always knows the stream ID of each track).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any strong arguments here. @ara4n and @dbkr, do you have any thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To explain the reasoning here. When browsing via Pion docs, I've noticed that StreamID is said to be unique only within a single peer connection (but not globally), while TrackID is meant to be unique within a stream, but not globally. This means that in the case of the SFU, the combination of StreamID and TrackID as per Pion would not be enough to uniquely identify a track, so initially, I was worried that our implementation is not correct. However, when checking what the browsers actually send as TrackID and StreamID, I've noticed that both are randomly generated with TrackID being a GUID (it's also not that off from the official spec). If the TrackID is a GUID, then it would be enough to use the GUID as an identifier for tracks when trying to subscribe/unsubscribe from tracks. The rest (StreamID) would anyway be known to the client once the subscription is completed since they will get the remote way along with its stream ID.

Also, using only the TrackID would allow us to support streamless tracks that may potentially exist in the MatrixRTC use case.

Comment on lines 297 to 299
If a user is using their SFU in a call, it will need to know how to connect to
other SFUs present in order to participate in the full-mesh of SFU traffic (if
any). The client is responsible for doing this using the `connect` op.
Copy link

@daniel-abramov daniel-abramov Dec 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to specify the cascading specific logic in this MSC or would it be better to make a separate MSC for cascading?

Rationale: if we have a dedicated MSC for the SFU, we'll be able to finalize and merge it faster to master. Iterating with small MSCs might be a better idea given the amount of time it normally takes until the MSC is merged? (just a gut feeling)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that the event fields used in a single focus case are quite different from the cascading case. I wonder if there is a way to avoid that issue

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I did not fully get what you mean.

I think the reason why I initially commented is that it seems like we're not going to have the cascading implemented in the very nearest future (currently we don't really support it), so I thought maybe it would be faster to limit this MSC to the SFU and then create a cascading MSC after that (once we have a stable SFU). I was just afraid that otherwise the MSC would stay open (or in a draft state) for too long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with that but I am not sure how to technically handle this - the MSC currently specifies an SFU selection algorithm and the fields it uses, if we wanted to split the MSC into two, we would need to completely different ways to specify the SFU, I think...

"content": {
"m.calls": [
{
"m.call_id": "cvsiu2893",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the call_id does not seem to be necessary.

When the SFU sends To-Device messages to the clients, the conf_id is specified and given that the conf_id is a unique identifier of a conference/call, there seem to be no need to have a call_id in addition to that.

Recently I've ran into an issue where I realized that call_id and conf_id are not the same (despite MSC3401 giving me an impression that they are identical). The conf_id was the ID of a conference (as expected), but the call_id was another random string that was different for each single participant which forced us to use both call_id and conf_id when sending messages back to the clients (otherwise they would be rejected).

It looks like call_id should either be removed or (if we want to keep it for the backward compatibility with the older MSC?) it must be equal to the conf_id.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment belongs on MSC3401 as this line is specified in other MSC, although I thinik the conclusion is just that there's confusion between call_id and conf_id and we should rename this to conf_id (there's no other conf ID in this event so it is necessary, not just for backwards compat).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've also written a comment about it in MSC3401 😛

Basically, the problem is not only that they are called differently, but also that the value of call_id and conf_id is different, so they are different for some reason (and on the SFU we are obligated to take both into account: conf_id for a conference ID and the call_id to set a value in outgoing To-Device messages without which the client would discard the messages).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you agree that the correct resolution is to change this to m.conf_id?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be great! Though I wonder what the consequence of that would be (i.e. what is that value that the current call_id has? - It's not a conference ID, it's something different, or maybe it's a leftover from a previous implementation for 1:1s where call_id meant something?)

Copy link

@daniel-abramov daniel-abramov Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really?

Yes 🙂 That's something that I discovered a week ago when deploying the first iteration of refactored SFU. I've just tried to join the SFU and the conf_id field is equal to 1668002318158qFQmBWgVHHXTZsPA, while the call_id is 1670443502134bOWVqa3btIfDQMjJ.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conf_id and call_id from where though? There will also be call_id in the individual calls which will definitely be different. Otherwise we need to work out what's going on here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conf_id and call_id from where though?

From To-Device messages that the participants of the conference send to the SFU. We then reply with To-Device messages back (e.g. when we generate an answer), in which case we also set both conf_id or call_id.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean https://github.com/matrix-org/matrix-js-sdk/blob/develop/src/webrtc/call.ts#L2252? conf_id is the ID of the conference call (state key of the m.call event), call_id is the ID of the 1:1 call between the individual group call participants.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, seems like this. But the thing is that, from the SFUs standpoint, the call_id does not have any semantics, but currently we're obligated to store both conf_id and call_id (which have different values), where the call_id is only used in order to send To-Device messages to the clients, i.e. when I e.g. want to send messages from the SFU to the client, I have to set both the conf_id (the ID of a conference) and the call_id (the ID of the 1:1 call between individual group call participants).

So my point is that we probably want to get rid of mandating call_id for the SFU calls since they don't seem any semantic value for this use case. And only use the conf_id instead?

@matrix-org matrix-org deleted a comment from daniel-abramov Dec 3, 2022
@SimonBrandner SimonBrandner changed the title [WIP] MSC3898: Native Matrix VoIP signalling for cascaded SFUs [WIP] MSC3898: Native Matrix VoIP signalling for cascaded foci Dec 3, 2022
@SimonBrandner SimonBrandner changed the title [WIP] MSC3898: Native Matrix VoIP signalling for cascaded foci [WIP] MSC3898: Native Matrix VoIP signalling for cascaded foci (SFUs, MCUs...) Dec 3, 2022

## Security considerations

Malicious users could try to DoS SFUs by specifying them as their foci.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are SFUs not (by default, with an option to the admin/operator to open it up) authenticated using one's matrix account? Shouldn't they be?
The cascaded decentralized SFU concept appears to be that there is one focus associated with each homeserver. Hence I would expect that I can ever only access my hs's SFU(s).

(by @HarHarLinks from #3401 (comment))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I learn more about this topic, foci seem to not be authenticated.
As a server admin, I would like if not anyone could use the focus I host. It would appear logical to allow only user of one or more associated homeservers and at most also temporarily their remote call members if the algorithm deems the focus favourable.

Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
like to start/stop subscribing to.

Upon receiving this event, a focus should make the subscribe changes based on
the `start` and `stop` arrays and respond with an `m.call.negotiate` event.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we always respond to the m.call.negotiate (we may re-use the transceiver if there is such a possibility)? Maybe we can just mention that the server may reply with the m.call.negotiate if it's practical/necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd stick with the current wording until we figure out something better and more specific

proposals/3898-sfu.md Outdated Show resolved Hide resolved
|Stable (post-FCP) |Unstable |
|------------------|-----------------------------------|
|`m.foci.active` |`org.matrix.msc3898.foci.active` |
|`m.foci.preferred`|`org.matrix.msc3898.foci.preferred`|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this to a line comment so it can be a thread:

Also, we need to probably mention somewhere how to deal with the "obsolete" To-Device message. When testing the SFU I've noticed that sometimes there were cases when I ended the SFU, but clients continued to post messages to the SFU (not knowing that the SFU is down). That resulted in lots of messages accumulating somewhere in the home server. Once the SFU got restarted, it got lots of messages back (which triggered many actions that must not have happened). We need to somehow receive and drop all old messages on the SFU side once we start it, i.e. send a request to the home server with something like "Hey, give me the messages since time.Now() and drop the rest" (conceptually; we probably can achieve this using the current API of the client SDK).

Yep, this is broadly the same problem as a client starting back up and receiving old messages, some of which are call invites, and having to determine whether the calls have been and gone, in which case it should just ignore the events, or if the call is still ringing.

There's actually not much we can do about this at the client-server API level: to-device messages are just store-and-forward, so there's no real way to tell which messages are old and which are recent. To-device messages don't have timestamps either which room events do. I think the options are either to read & discard all to-device messages at startup or add a timestamp & expiry at the event level. This will also tie in to how we manage the lifetime of a focus and how we balance between foci both to distribute load and so we can restart individual foci.


#### Discovering foci

- **TODO: How does a client discover foci? We could use well-known or a custom endpoint**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts: how we load balance between SFUs and manage availability will have a bearing on this, ie. would we expect SFUs to become unavailable when they get restarted / updated and therefore how often a client expect its SFU (or list of SFUs?) to change?

proposals/3898-sfu.md Outdated Show resolved Hide resolved
"start": [
{
"stream_id": "streamId1",
"track_id": "trackId1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should include the user ID of the user sending the track we want here? That way we're not relying on stream/track IDs being globally unique (plus it will make the the signalling much easier to understand when looking at it). The stream ID feels unnecessary in either case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, interesting point, perhaps device_id as well? So it would be (track_id, device_id, track_id)? @daniel-abramov, what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I guess that if we have these we might as well leave the stream_id there for flexibility....

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a part of a discussion that we've recently had about the stream/track IDs.

So far the trackIDs were unique regardless of the browser we used for the tests (we even changed the code of the waterfall to only rely on trackID when subscribing to tracks and it seems to work just fine and the handling is simpler and more elegant).

I think we have 2 options here:

  1. Either use trackID only (seems to be totally fine since trackIDs are GUIDs).
  2. Or use a tuple of track_id, device_id and stream_id that @SimonBrandner suggested in the comment above.

The current implementation in the SFU uses (1), which also it seems to be ok from the RFC's standpoint:

[..] A good practice is to use a UUID [rfc4122], which is 36 characters long in its canonical form. To avoid fingerprinting, implementations SHOULD use the forms in section 4.4 or 4.5 of RFC 4122 when generating UUIDs. [..]

I don't have a strong opinion, but I'm always biased toward elegant and simple solutions, so my personal preference would be an option (1).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, sorry - this is very similar, but github has hidden that comment as outdated. The RFC is only suggesting UUIDs as good practice though, so I'm not sure we can rely on it. Šimon's correct too in that we'd need the device ID too if we couldn't be sure that the track ID was globally unique.

Another thing we could do here is specify the SFU(s?) to get the stream from? I think this would mean we wouldn't need the the connect-to_focus message?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing we could do here is specify the SFU(s?) to get the stream from? I think this would mean we wouldn't need the the connect-to_focus message?

For cascading? - Yeah, probably, but I have not yet thought through the whole cascading thing yet (but probably we could approach the cascading topic similar to what we did with the SFU conferencing: experiment with things in code and update an MSC once we gathered more information on what works).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we shouldn't be using WebRTC track-ids at all (https://blog.mozilla.org/webrtc/the-evolution-of-webrtc/). We should identify by mids to the focus and either use this directly or make up our own ID to reference media here, mapping it to the mid on the focus with a stream_metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing we could do here is specify the SFU(s?) to get the stream from? I think this would mean we wouldn't need the the connect-to_focus message?

It's not very clear to me how that would work, tbh

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we shouldn't be using WebRTC track-ids at all (https://blog.mozilla.org/webrtc/the-evolution-of-webrtc/). We should identify by mids to the focus and either use this directly or make up our own ID to reference media here, mapping it to the mid on the focus with a stream_metadata.

This is a good point, @dbkr. I also read this article in the past, but got confused and ignored the conclusion since I saw that the approach of using stream IDs and track IDs did seem to work for the EC despite that article from Mozilla stating that it's a no go (other, newer articles had similar conclusions).

I tried to correlate the information between that particle + another article on transceivers from Mozilla + webrtcforthecurious + WebRTC API docs from Mozilla to understand what's the correct way to tackle this problem.

Since my notes were rather large for a comment, I've created a discussion page for that as we agreed.

Please take a look: https://github.com/vector-im/voip-internal/discussions/79

proposals/3898-sfu.md Outdated Show resolved Hide resolved
proposals/3898-sfu.md Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:core MSC which is critical to the protocol's success needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal voip
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants