-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] MSC3898: Native Matrix VoIP signalling for cascaded foci (SFUs, MCUs...) #3898
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
I've duplicated the diagrams here mainly for context and the |
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Another issue that I ran into while refactoring the SFU was that I noticed that we don't have any feedback regarding messages sent to (and from) the SFU failing. This made things hard to debug since the client never knows if the message sent to the SFU was processed and if the result was successful. I.e. if you try to subscribe to a track and you don't get any tracks back, you don't really know if you need to retry and if so, for how long. We probably need to introduce a mechanism to report errors from/to the SFU. And because of the async nature of the communication, each such message will have to have some sort of a transaction ID or something like this, so that if the client sends
Subscribe
,Subscribe
,Subscribe
and gets an error back, it knows whichSubscribe
command the error corresponds to. - Also, we need to probably mention somewhere how to deal with the "obsolete" To-Device message. When testing the SFU I've noticed that sometimes there were cases when I ended the SFU, but clients continued to post messages to the SFU (not knowing that the SFU is down). That resulted in lots of messages accumulating somewhere in the home server. Once the SFU got restarted, it got lots of messages back (which triggered many actions that must not have happened). We need to somehow receive and drop all old messages on the SFU side once we start it, i.e. send a request to the home server with something like "Hey, give me the messages since time.Now() and drop the rest" (conceptually; we probably can achieve this using the current API of the client SDK).
proposals/3898-sfu.md
Outdated
"stream_id": "streamId1", | ||
"track_id": "trackId1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, do we really need both track ID and stream ID for the SFU use case?
The track IDs that browsers generate seem to be GUIDs that are unique enough (i.e. it's unlikely that there would be 2 tracks with the same GUID). Does this mean that instead of using two values, we could just send the track_id
? (the server anyway always knows the stream ID of each track).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To explain the reasoning here. When browsing via Pion docs, I've noticed that StreamID
is said to be unique only within a single peer connection (but not globally), while TrackID
is meant to be unique within a stream, but not globally. This means that in the case of the SFU, the combination of StreamID
and TrackID
as per Pion would not be enough to uniquely identify a track, so initially, I was worried that our implementation is not correct. However, when checking what the browsers actually send as TrackID
and StreamID
, I've noticed that both are randomly generated with TrackID
being a GUID (it's also not that off from the official spec). If the TrackID
is a GUID, then it would be enough to use the GUID
as an identifier for tracks when trying to subscribe/unsubscribe from tracks. The rest (StreamID) would anyway be known to the client once the subscription is completed since they will get the remote way along with its stream ID.
Also, using only the TrackID
would allow us to support streamless tracks that may potentially exist in the MatrixRTC use case.
proposals/3898-sfu.md
Outdated
If a user is using their SFU in a call, it will need to know how to connect to | ||
other SFUs present in order to participate in the full-mesh of SFU traffic (if | ||
any). The client is responsible for doing this using the `connect` op. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to specify the cascading specific logic in this MSC or would it be better to make a separate MSC for cascading?
Rationale: if we have a dedicated MSC for the SFU, we'll be able to finalize and merge it faster to master. Iterating with small MSCs might be a better idea given the amount of time it normally takes until the MSC is merged? (just a gut feeling)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that the event fields used in a single focus case are quite different from the cascading case. I wonder if there is a way to avoid that issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I did not fully get what you mean.
I think the reason why I initially commented is that it seems like we're not going to have the cascading implemented in the very nearest future (currently we don't really support it), so I thought maybe it would be faster to limit this MSC to the SFU and then create a cascading MSC after that (once we have a stable SFU). I was just afraid that otherwise the MSC would stay open (or in a draft state) for too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with that but I am not sure how to technically handle this - the MSC currently specifies an SFU selection algorithm and the fields it uses, if we wanted to split the MSC into two, we would need to completely different ways to specify the SFU, I think...
"content": { | ||
"m.calls": [ | ||
{ | ||
"m.call_id": "cvsiu2893", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the call_id
does not seem to be necessary.
When the SFU sends To-Device messages to the clients, the conf_id
is specified and given that the conf_id
is a unique identifier of a conference/call, there seem to be no need to have a call_id
in addition to that.
Recently I've ran into an issue where I realized that call_id
and conf_id
are not the same (despite MSC3401 giving me an impression that they are identical). The conf_id
was the ID of a conference (as expected), but the call_id
was another random string that was different for each single participant which forced us to use both call_id
and conf_id
when sending messages back to the clients (otherwise they would be rejected).
It looks like call_id
should either be removed or (if we want to keep it for the backward compatibility with the older MSC?) it must be equal to the conf_id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment belongs on MSC3401 as this line is specified in other MSC, although I thinik the conclusion is just that there's confusion between call_id and conf_id and we should rename this to conf_id (there's no other conf ID in this event so it is necessary, not just for backwards compat).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've also written a comment about it in MSC3401 😛
Basically, the problem is not only that they are called differently, but also that the value of call_id
and conf_id
is different, so they are different for some reason (and on the SFU we are obligated to take both into account: conf_id
for a conference ID and the call_id
to set a value in outgoing To-Device messages without which the client would discard the messages).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you agree that the correct resolution is to change this to m.conf_id
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would be great! Though I wonder what the consequence of that would be (i.e. what is that value that the current call_id
has? - It's not a conference ID, it's something different, or maybe it's a leftover from a previous implementation for 1:1s where call_id
meant something?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really?
Yes 🙂 That's something that I discovered a week ago when deploying the first iteration of refactored SFU. I've just tried to join the SFU and the conf_id
field is equal to 1668002318158qFQmBWgVHHXTZsPA
, while the call_id
is 1670443502134bOWVqa3btIfDQMjJ
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conf_id and call_id from where though? There will also be call_id in the individual calls which will definitely be different. Otherwise we need to work out what's going on here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conf_id and call_id from where though?
From To-Device messages that the participants of the conference send to the SFU. We then reply with To-Device messages back (e.g. when we generate an answer), in which case we also set both conf_id
or call_id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean https://github.com/matrix-org/matrix-js-sdk/blob/develop/src/webrtc/call.ts#L2252? conf_id
is the ID of the conference call (state key of the m.call event), call_id
is the ID of the 1:1 call between the individual group call participants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, seems like this. But the thing is that, from the SFUs standpoint, the call_id
does not have any semantics, but currently we're obligated to store both conf_id
and call_id
(which have different values), where the call_id
is only used in order to send To-Device messages to the clients, i.e. when I e.g. want to send messages from the SFU to the client, I have to set both the conf_id
(the ID of a conference) and the call_id
(the ID of the 1:1 call between individual group call participants).
So my point is that we probably want to get rid of mandating call_id
for the SFU calls since they don't seem any semantic value for this use case. And only use the conf_id
instead?
|
||
## Security considerations | ||
|
||
Malicious users could try to DoS SFUs by specifying them as their foci. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are SFUs not (by default, with an option to the admin/operator to open it up) authenticated using one's matrix account? Shouldn't they be?
The cascaded decentralized SFU concept appears to be that there is one focus associated with each homeserver. Hence I would expect that I can ever only access my hs's SFU(s).
(by @HarHarLinks from #3401 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I learn more about this topic, foci seem to not be authenticated.
As a server admin, I would like if not anyone could use the focus I host. It would appear logical to allow only user of one or more associated homeservers and at most also temporarily their remote call members if the algorithm deems the focus favourable.
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
like to start/stop subscribing to. | ||
|
||
Upon receiving this event, a focus should make the subscribe changes based on | ||
the `start` and `stop` arrays and respond with an `m.call.negotiate` event. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we always respond to the m.call.negotiate
(we may re-use the transceiver if there is such a possibility)? Maybe we can just mention that the server may reply with the m.call.negotiate
if it's practical/necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd stick with the current wording until we figure out something better and more specific
|Stable (post-FCP) |Unstable | | ||
|------------------|-----------------------------------| | ||
|`m.foci.active` |`org.matrix.msc3898.foci.active` | | ||
|`m.foci.preferred`|`org.matrix.msc3898.foci.preferred`| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving this to a line comment so it can be a thread:
Also, we need to probably mention somewhere how to deal with the "obsolete" To-Device message. When testing the SFU I've noticed that sometimes there were cases when I ended the SFU, but clients continued to post messages to the SFU (not knowing that the SFU is down). That resulted in lots of messages accumulating somewhere in the home server. Once the SFU got restarted, it got lots of messages back (which triggered many actions that must not have happened). We need to somehow receive and drop all old messages on the SFU side once we start it, i.e. send a request to the home server with something like "Hey, give me the messages since time.Now() and drop the rest" (conceptually; we probably can achieve this using the current API of the client SDK).
Yep, this is broadly the same problem as a client starting back up and receiving old messages, some of which are call invites, and having to determine whether the calls have been and gone, in which case it should just ignore the events, or if the call is still ringing.
There's actually not much we can do about this at the client-server API level: to-device messages are just store-and-forward, so there's no real way to tell which messages are old and which are recent. To-device messages don't have timestamps either which room events do. I think the options are either to read & discard all to-device messages at startup or add a timestamp & expiry at the event level. This will also tie in to how we manage the lifetime of a focus and how we balance between foci both to distribute load and so we can restart individual foci.
|
||
#### Discovering foci | ||
|
||
- **TODO: How does a client discover foci? We could use well-known or a custom endpoint** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts: how we load balance between SFUs and manage availability will have a bearing on this, ie. would we expect SFUs to become unavailable when they get restarted / updated and therefore how often a client expect its SFU (or list of SFUs?) to change?
"start": [ | ||
{ | ||
"stream_id": "streamId1", | ||
"track_id": "trackId1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should include the user ID of the user sending the track we want here? That way we're not relying on stream/track IDs being globally unique (plus it will make the the signalling much easier to understand when looking at it). The stream ID feels unnecessary in either case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, interesting point, perhaps device_id
as well? So it would be (track_id, device_id, track_id)
? @daniel-abramov, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I guess that if we have these we might as well leave the stream_id
there for flexibility....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like a part of a discussion that we've recently had about the stream/track IDs.
So far the trackID
s were unique regardless of the browser we used for the tests (we even changed the code of the waterfall
to only rely on trackID
when subscribing to tracks and it seems to work just fine and the handling is simpler and more elegant).
I think we have 2 options here:
- Either use
trackID
only (seems to be totally fine sincetrackID
s are GUIDs). - Or use a tuple of
track_id
,device_id
andstream_id
that @SimonBrandner suggested in the comment above.
The current implementation in the SFU uses (1), which also it seems to be ok from the RFC's standpoint:
[..] A good practice is to use a UUID [rfc4122], which is 36 characters long in its canonical form. To avoid fingerprinting, implementations SHOULD use the forms in section 4.4 or 4.5 of RFC 4122 when generating UUIDs. [..]
I don't have a strong opinion, but I'm always biased toward elegant and simple solutions, so my personal preference would be an option (1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, sorry - this is very similar, but github has hidden that comment as outdated. The RFC is only suggesting UUIDs as good practice though, so I'm not sure we can rely on it. Šimon's correct too in that we'd need the device ID too if we couldn't be sure that the track ID was globally unique.
Another thing we could do here is specify the SFU(s?) to get the stream from? I think this would mean we wouldn't need the the connect-to_focus message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing we could do here is specify the SFU(s?) to get the stream from? I think this would mean we wouldn't need the the connect-to_focus message?
For cascading? - Yeah, probably, but I have not yet thought through the whole cascading thing yet (but probably we could approach the cascading topic similar to what we did with the SFU conferencing: experiment with things in code and update an MSC once we gathered more information on what works).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we shouldn't be using WebRTC track-ids at all (https://blog.mozilla.org/webrtc/the-evolution-of-webrtc/). We should identify by mids to the focus and either use this directly or make up our own ID to reference media here, mapping it to the mid on the focus with a stream_metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing we could do here is specify the SFU(s?) to get the stream from? I think this would mean we wouldn't need the the connect-to_focus message?
It's not very clear to me how that would work, tbh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we shouldn't be using WebRTC track-ids at all (https://blog.mozilla.org/webrtc/the-evolution-of-webrtc/). We should identify by mids to the focus and either use this directly or make up our own ID to reference media here, mapping it to the mid on the focus with a stream_metadata.
This is a good point, @dbkr. I also read this article in the past, but got confused and ignored the conclusion since I saw that the approach of using stream IDs and track IDs did seem to work for the EC despite that article from Mozilla stating that it's a no go (other, newer articles had similar conclusions).
I tried to correlate the information between that particle + another article on transceivers from Mozilla + webrtcforthecurious + WebRTC API docs from Mozilla to understand what's the correct way to tackle this problem.
Since my notes were rather large for a comment, I've created a discussion page for that as we agreed.
Please take a look: https://github.com/vector-im/voip-internal/discussions/79
Co-authored-by: David Baker <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Šimon Brandner <[email protected]>
Builds on/split out of #3401
Rendered