Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2746: Improved VoIP Signalling #2746

Merged
merged 81 commits into from
Apr 28, 2023
Merged
Changes from 2 commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
cf50137
Placeholder for reliable VoIP MSC
dbkr Aug 21, 2020
a9b17fc
First version written up
dbkr Aug 21, 2020
37c1f98
Typo
dbkr Aug 24, 2020
5156caf
Typo
dbkr Aug 24, 2020
fe8b1eb
Switch to `m.call.select_answer`
dbkr Aug 24, 2020
6c4a077
Make self-calling possible
dbkr Aug 25, 2020
25ed29a
Nobody spotted the deliberate typo
dbkr Sep 3, 2020
bec62ab
Fixes & clarifications from Brendan
dbkr Sep 4, 2020
019bcdd
answers ID -> party ID
dbkr Sep 4, 2020
66179f1
clarify party_id
dbkr Sep 4, 2020
9e8c829
require that the client tries to decrypt all events before ringing
dbkr Sep 4, 2020
e224af3
not all of these necessary's were necessary
dbkr Sep 4, 2020
2561820
Apply suggestions from code review
dbkr Sep 11, 2020
63cecd1
line break
dbkr Sep 11, 2020
c6f6ca1
workaround markdown being awful
dbkr Sep 11, 2020
e9fe3af
specify grammar for IDs
dbkr Sep 11, 2020
563dba5
document why not mandate the same device IDs
dbkr Sep 11, 2020
070451e
rejection is about what the caller sees, not what's been sent
dbkr Sep 11, 2020
7bca76e
Explain use of the age field
dbkr Sep 11, 2020
e361fa9
Clarify party_id / user_id tuple in negotiate events
dbkr Sep 15, 2020
722ee0d
Require end-of-candidates candidate
dbkr Sep 17, 2020
8e76616
Add alternatives note for trickle ICE discovery mechansim
dbkr Sep 17, 2020
7e742d4
add that chrome spits out `icegatheringstatechange`
dbkr Sep 21, 2020
8da4b7c
clients must accept string version
dbkr Oct 14, 2020
18200d0
Add text on unstable prefixng (and how/why we aren't)
dbkr Oct 20, 2020
9e22601
Specify what happens when someone leaves the room
dbkr Oct 20, 2020
a446629
Rejig m.call.negotiate
dbkr Oct 22, 2020
1326a22
Explain politness & glare in a simpler way (I hope)
dbkr Oct 23, 2020
a669828
Add note on why we don't allow for ICE before an answer.
dbkr Oct 26, 2020
0fa0770
Define WebRTC track & stream configs for calls
dbkr Oct 26, 2020
599ad3c
select_answer was missing a version
dbkr Dec 3, 2020
6478f9d
Fix old type
dbkr Dec 3, 2020
b62f842
Clarfy that whatever codecs webrtc say is what goes
dbkr Feb 15, 2021
9156c80
Typos
dbkr Feb 15, 2021
834bc3b
Typo
dbkr Mar 4, 2021
ec2c7fe
only allow the number zero as numeric version
dbkr Mar 4, 2021
a572eb8
Update 2746-reliable-voip.md
ara4n Apr 6, 2021
6592023
Add user_busy hangup / reject reason
dbkr May 26, 2021
996adab
Add capability for DTMF
dbkr Jun 21, 2021
91428c8
Be clear about versions
SimonBrandner Jul 11, 2022
e42dd41
Clarify clients must respond to `m.call.negotiate`
SimonBrandner Jul 11, 2022
29485c4
Give `m.call.negotiate` a version
SimonBrandner Jul 11, 2022
f69ae72
Remove repeated words
SimonBrandner Jul 11, 2022
51e02b2
Be clearer about types
SimonBrandner Jul 11, 2022
312ffe5
Avoid defining call types
SimonBrandner Jul 11, 2022
4617af5
Specify minimal `lifetime`
SimonBrandner Jul 11, 2022
289fb3f
Use MSC1597 grammar for call / party IDs.
dbkr Nov 8, 2022
1392eae
Add more rationale around voip event version
dbkr Feb 6, 2023
46bfbde
Change advice for calls in public rooms.
dbkr Feb 6, 2023
9dcda02
Typo
dbkr Feb 6, 2023
6dc85a8
Clarify reject/hangup sending
dbkr Feb 6, 2023
f138bfe
Merge branch 'dbkr/msc2746' of github.com:matrix-org/matrix-spec-prop…
dbkr Feb 6, 2023
2fd97c9
Clarify hangup reason backwards compat
dbkr Feb 6, 2023
04eaee2
Clarify party ID
dbkr Feb 6, 2023
c52a845
There is no sender field.
dbkr Feb 6, 2023
6dcf65c
Require ignoring negotiates not matching party ID
dbkr Feb 6, 2023
859cf6d
Word negotiate events better
dbkr Feb 6, 2023
340e769
Don't forget the txn ID is returned by the send call.
dbkr Feb 6, 2023
8be57ed
Enumerate all the current VoIP events in 'version' section.
dbkr Feb 6, 2023
a09be95
Clarify treatment of version numeric 1.
dbkr Feb 6, 2023
5d38f15
Clarify that track/stream layout is new.
dbkr Feb 6, 2023
3162911
Link to m.call.invite
dbkr Feb 6, 2023
097fa58
Suggestions from richvdh
dbkr Feb 6, 2023
4eaa0b4
More suggestions from richvdh
dbkr Feb 6, 2023
db5ca80
More suggestions from richvdh
dbkr Feb 6, 2023
48527fc
Clarify call invite
dbkr Feb 6, 2023
312cdf7
Pluralise
dbkr Feb 6, 2023
8286d10
Reflect that MSC1597 hasn't landed yet.
dbkr Feb 6, 2023
a5e963f
Politeness only applies to renegotiation
dbkr Feb 6, 2023
b09af73
s/Mandate/define/
dbkr Feb 6, 2023
1fc6b37
Remove DTMF capability section to move tov MSC2747.
dbkr Feb 6, 2023
c9f0574
Clarify backwards compat
dbkr Mar 28, 2023
0880475
Grammar
dbkr Mar 28, 2023
e45c1e0
Fix quotes
dbkr Mar 28, 2023
bdf9639
Typo
dbkr Mar 28, 2023
082f216
Remove sentence that I think is now just redundant
dbkr Mar 28, 2023
ce0e338
Clarify mre on type field
dbkr Mar 28, 2023
2919112
Clarify end-of-candidates
dbkr Mar 28, 2023
c949b32
Add comma
dbkr Mar 29, 2023
7d8d527
Update 2746-reliable-voip.md (#3992)
richvdh Apr 5, 2023
3925586
wording changes
anoadragon453 Apr 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
194 changes: 194 additions & 0 deletions proposals/2746-reliable-voip.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
# MSC2746: Improved Signalling for 1:1 VoIP
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this is an MSC which touches event schemas while Extensible Events is on the battlefield, just a heads up that a v3 of call events is somewhat on the horizon to make them legal in extensible event-supported rooms. I don't think this MSC needs to do anything specific to solve the conflict (unless you want it to, but that means putting an even longer delay on it landing), but implementations of calls should be aware that calls will be changing again (sorry).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is probably fine. I think we would want this version of calls defined in a version of the spec in any case.


Matrix has basic support for signalling 1:1 WebRTC calls, but has a number of shortcomings:

* If several devices try to answer the same call, there is no way for them to determine clearly
that the caller has set up the call with a different device, and no way for the caller to
determine candidate events map to which answer.
dbkr marked this conversation as resolved.
Show resolved Hide resolved
* Hangup reasons are often incorrect.
* There is confusion and no clear guidance on how clients should determine whether an incoming
invite is stale or not.
* There is no support for renegotiation of SDP, for changing ICE candidates / hold/resume
functionality, etc.
* There is no distinction between rejecting a call and ending it, which means that in trying
to reject a call, a client can inadvertantly cause a call that has been sucessfully set up
on a different device to be hung up.

## Proposal
### Change the `version` field in all VoIP events to `1`
This will be used to determine whether determine whether devices support this new version of the protocol.
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved
If clients see events with `version` other than `0` or `1`, they should treat these the same as if they had
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we treat 3 as version 1? Won't that cause issues in the future? Why do we do this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is that newer versions will have their own MSC. This is describing what clients should do should they encounter an unknown version, 0 and 1 being the only known versions in this context.

Copy link
Contributor

@deepbluev7 deepbluev7 Mar 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it currently reads like version 3 should be treated as 1, which sounds wrong. If a client implements it that way now, you can't make a new version ever, but I guess that is not the intention here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New versions mean implementation changes - in which case you can also change what it considers to be the most recent known version from 1 to something else. It's a bit like with room versions: currently a homeserver will refuse to create/join a room in a version it doesn't know about, but that doesn't mean it will never be able to support this new version. The only difference here is the fallback; in the case of a room version the server refuses to perform the action, in the case of a call the client substitutes it with the most recent known version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So future versions will be compatible enough, that this doesn't cause issues?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, version 3 should be treated identically to 1 by anything implementing this spec. Something implementing a future spec may treat it differently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like it would cause issues, whenever we actually need to make a breaking change. How would that work? Shouldn't clients instead negotiate the lowest common version instead of treating every newer version as an older version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to make a breaking change, probably just easier to move to a whole new set of event types.

`version` == `1`.
richvdh marked this conversation as resolved.
Show resolved Hide resolved

### Add `answer_id` to `m.call.answer`
This is a random string generated by the sending device in the same way as `call_id` (ie. in this case,
it should be sufficiently unique in the context of this VoIP call). This identifies each answer sent to
a given `m.call.invite`.

*Also considered: use the `event_id` of the answer event: this is rejected for similarity with `call_id`
where it is desirable to know the ID of a call before receving the remote echo of the invite event (this
dbkr marked this conversation as resolved.
Show resolved Hide resolved
will be useful in future for call transfers where the transferor can assign a `call_id` for the transferee
to use).*

### Add `answer_id` to `m.call.candidates`
This allows the caller to determine which candidate events correspond to which answer (for the callee,
all candidates with matching `call_id` not from its own user are from the caller party, of which there
is only one.)

### Introduce `m.call.ack`
dbkr marked this conversation as resolved.
Show resolved Hide resolved
This event is sent by the caller once it has chosen an answer. Its `answer_id`
field indicates the answer it's chosen (and has `call_id` too). If the callee
sees an ack for an answer ID other than the one it sent, it ends the call and
informs the user the call was answered elsewhere. It does not send any events.
Media can start flowing before this event is seen or even sent. Clients that
implement previous versions of this specification will ignore this event and
behave as they did before.

Example:
```
{
"type": "m.call.ack",
"content": {
"call_id": "12345",
"answer_id": "67890",
},
dbkr marked this conversation as resolved.
Show resolved Hide resolved
}
```

### Introduce `m.call.reject`

* If the `m.call.invite` event has `version` `1`, a client wishing to reject a call instead
dbkr marked this conversation as resolved.
Show resolved Hide resolved
sends an `m.call.reject` event. This rejects the call on all devices, but if another device
has already sent an accept, it disregards the reject and carries on. The reject has an
`answer_id` just like an answer, and the caller acks it just like an answer. If the other
client that had already sent an answer sees the caller ack the reject instead of its answer,
it ends the call.
* If the `m.call.invite` event has `version` `0`, the callee send an `m.call.hangup` event before.
dbkr marked this conversation as resolved.
Show resolved Hide resolved

Example:
```
{
"type": "m.call.reject",
"content" : {
"version": 1,
"call_id": "12345",
"answer_id": "67890",
}
}
```

If the calling user chooses to end the call before setup is complete, the client sends `m.call.hangup`
as previously.

### Clarify what actions a client may take in response to an invite
The client may:
* Attempt to accept the call by sending an answer
dbkr marked this conversation as resolved.
Show resolved Hide resolved
* Actively reject the call everywhere: reject the call as per above, which will stop the call from
dbkr marked this conversation as resolved.
Show resolved Hide resolved
ringing on all the user's devices and the caller's client will inform them that the user has
rejected their call.
* Ignore the call: send no events, but stop alerting the user about the call. The user's other
devices will continue to ring, and the caller's device will continue to indicate that the call
is ringing, and will time the call out in the normal way if no other device responds.

### Introduce more reason codes to `m.call.hangup`
dbkr marked this conversation as resolved.
Show resolved Hide resolved
* `ice_timeout`: The connection failed after some media was exchanged (as opposed to current
`ice_failed` which means no media connection could be established). Note that, in the case of
an ICE renegotiation, a client should be sure to send `ice_timeout` rather than `ice_failed` if
media had previously been received successfully, even if the ICE renegotiation itself failed.
* `user_hangup`: Clients must now send this code when the user chooses to end the call, although
for backwards compatability, a clients should treat an absence of the `reason` field as
uhoreg marked this conversation as resolved.
Show resolved Hide resolved
`user_hangup`.
* `user_media_failed`: The client was unable to start capturing media in such a way as it is unable
dbkr marked this conversation as resolved.
Show resolved Hide resolved
to continue the call.
* `unknown_error`: Some other failure occurred that meant the client was unable to continue the call
rather than the user choosing to end it.

### Introduce `m.call.negotiate`
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved
This introduces SDP negotiation semantics for media pause, hold/resume, ICE restarts and voice/video
call up/downgrading. Clients should implement & honour hold functionality as per WebRTC's
recommendation: https://www.w3.org/TR/webrtc/#hold-functionality

If both the invite event and the accepted answer event have `v` equal to `1`, either party may
dbkr marked this conversation as resolved.
Show resolved Hide resolved
send `m.call.negotiate` with an `sdp` field to offer new SDP to the other party. This event has
`call_id` with the ID of the call and those sent by the callee have `answer_id` equal to the ID
of the client's answer. The caller ignores any negotiate events with `answer_id` not equal to the
answer it accepted. Clients may either use the same mechanism used for remote echo of messages
to recognise and ignore their own negotiate messages (ie. txn id) or they may ignore messages
from their own user, or they may use the presence or absence of an `answer_id` field.

This has a `lifetime` field as in `m.call.invite`, after which the sender of the negotiate event
Copy link
Contributor

@babolivier babolivier Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on Matrix, I think it'd be better if lifetimes (both in m.call.negotiate and m.call.invite events) were replaced by absolute timestamps for when the call should time out. That way we don't rely on an arbitrary field set by the server to determine when the call should expire, and we have a single source of truth for when this expiration should happen (i.e. the content set by the client) rather than hoping the client and the server agree on the time (which they often do, but if they don't then it can get complicated). The concern then becomes that clients can get out-of-sync time-wise but in my experience client terminals (i.e. desktops, mobiles, tablets, etc) are much more likely to be time-synced out of the box than servers, so I'm not sure it should be that big of a concern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we might also have covered this on Matrix at the time, but this is basically designed to avoid ever assuming the client clocks will be synced, but assuming those on the HSes are correct (at least within a second or so). It is somewhat interesting that since clients are usually on consumer devices which are managed by the device / OS vendor who sets up NTP, they're now often more likely to have correct clocks than servers which are subject to the server admin forgetting to start ntpd.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@babolivier, what's your view on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this copies what has already been done in a different place, I'd suggest sticking with this for now and rethinking it in either a new MSC or if we ultimately phase this out in favour of MSC3401.

should consider the negotiation failed (timed out) and the recipient should ignore it.
dbkr marked this conversation as resolved.
Show resolved Hide resolved

Example:
```
{
"type": "m.call.negotiate",
"content": {
"call_id": "12345",
"answer_id": "67890",
"sdp": "[some sdp]",
"lifetime": 10000,
}
}
```

### Designate one party as 'polite'
In line with WebRTC perfect negotiation (https://w3c.github.io/webrtc-pc/#perfect-negotiation-example)
we introduce rules to establish which party is polite. By default, the callee is the polite party.
In a glare situation, if the client receives an invite whilst preparing to send, it becomes the callee
dbkr marked this conversation as resolved.
Show resolved Hide resolved
and therefore becomes the polite party. If an invite is received after the client has sent one, the
party whose invite had the lexicographically greater call ID becomes the polite party.

### Add explicit recommendations for call event liveness.
`m.call.invite` contains a `lifetime` field that indicates how long the offer is valid for. When
a client receives an invite, it should use the `age` field of the event plus the time since it
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved
received the event from the homeserver to determine whether the invite is still valid. If the
invite is still valid *and will remain valid for long enough for the user to accept the call*,
it should signal an incoming call. The amount of time allowed for the user to accept the call may
vary between clients, for example, it may be longer on a locked mobile device than on an unlocked
dbkr marked this conversation as resolved.
Show resolved Hide resolved
desktop device.

The client should only signal an incoming call once it has completed processing the entire sync
response. This ensures that if the sync response contains subsequent events that indicate the call
has been hung up, rejected, or answered elsewhere, the client does not signal it.

If on startup, after processing locally stored events, the client determines that there is an invite
that is still valid, it should still signal it but only after it has completed a sync from the homeserver.
babolivier marked this conversation as resolved.
Show resolved Hide resolved

### Introduce recommendations for batching of ICE candidates
Clients should aim to send a small number of candidate events, with guidelines:
* ICE candidates which can be discovered immediately or almost immediately in the invite/answer
event itself (eg. host candidates). If server reflexive or relay candiates can be gathered in
a sufficiently short period of time, these should be sent here too. A delay of around 200ms is
suggested as a starting point.
* The client should then allow some time for further candidates to be gathered in order to batch them,
rather than sending each candidate as it arrives. A starting point of 2 seconds after sending the
invite or 500ms after sending the answer is suggested as starting point (since a delay is natural
anoadragon453 marked this conversation as resolved.
Show resolved Hide resolved
anyway after the invite whilst the client waits for the user to accept it).

### Add DTMF
Add that Matrix clients can send DTMF as specified by WebRTC. The WebRTC standard as of August
2020 does not support receiving DTMF but a Matrix client can receive and interpret the DTMF sent
in the RTP payload.
dbkr marked this conversation as resolved.
Show resolved Hide resolved
richvdh marked this conversation as resolved.
Show resolved Hide resolved

### Deprecate `type` in `m.call.invite` and `m.call.answer`
These are redundant: clients should continue to send them but must not require
them to be present on events they receive.

## Potential issues
* It remains explicity impossible to place a call to yourself. Matrix uses a shared medium for
signalling so a client will always see invites from other devices. We would need to introduce
a way for a client to signal to other devices that they should treat the invite as an incoming
call and mechanisms to clarify what events were from which party in a call. This would mean a
significant amount of protocol dedcated to just this feature, so this MSC omits it.
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved

## Alternatives
* We could use event IDs for `call_id` and `answer_id` as discussed above.
* The event type of `m.call.ack` mirrors that of SIP, although gives few other clues on its purpose.
`m.call.choose_answer` was considered but is quite verbose.
## Security considerations
* IP addresses remain in the room in candidates, as they did in the previous version of the spec.
This is not ideal, but alternatives were either sending candidates over to-device messages
(would slow down call setup because a target device would have to be established before sending
candidates) or redacting them afterwards (the volume of events sent during calls can already
cause rate limiting issues and this would exacerbate this).
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved