Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drops packets, after re-invite sdp port changed, only when using "asymmetric+strict-source" flags together #1679

Open
smititelu opened this issue Jun 16, 2023 · 19 comments

Comments

@smititelu
Copy link
Contributor

smititelu commented Jun 16, 2023

Hi, we noticed an issue with RTPEngine dropping RTP packets after re-invite SDP port changed, only when using "asymmetric+strict-source" flags together. Traced back issue starts happening after this commit: 900d2be.

RTPEngine logs:

[core] Drop due to strict-source attribute; got X.X.X.X:60066, expected X.X.X.X:60064

... and packets come in on the new port but rtpengine do not forward them.

A fix for this issue would be either:

  • don't use "asymmetric" anymore but keep using "strict-source"
  • don't use that commit, but keep using "asymmetric+strict-source" flags

I raised this issue, maybe that new commit has some bugs? Please have a look. Please close this if there is no bug with that commit, and any of the above fixes should be applied.

Thank you,
Stefan

@smititelu
Copy link
Contributor Author

smititelu commented Jun 16, 2023

Full rtpengine logs for the issue:

[control] Received command 'offer' from CC.CC.CC.CC:42815
[control] Replying to 'offer' from CC.CC.CC.CC:42815 (elapsed time 0.001055 sec)
[control] Received command 'answer' from CC.CC.CC.CC:35846
[control] Replying to 'answer' from CC.CC.CC.CC:35846 (elapsed time 0.002622 sec)
[core] RTP packet with unknown payload type 13 received from XX.XX.XX.XX:60064
[core] Kernelizing media stream: XX.XX.XX.XX:60064 -> YY.YY.YY.YY:51562 | YY.YY.YY.YY:36662 -> ZZ.ZZ.ZZ.ZZ:42590
[core] Kernelizing media stream: ZZ.ZZ.ZZ.ZZ:42590 -> YY.YY.YY.YY:36662 | YY.YY.YY.YY:51562 -> XX.XX.XX.XX:60064
[control] Received command 'offer' from CC.CC.CC.CC:59469
[core] Removing media stream from kernel: local YY.YY.YY.YY:36662 (dialogue signalling event)
[core] Removing media stream from kernel: local YY.YY.YY.YY:51562 (dialogue signalling event)
[control] Replying to 'offer' from CC.CC.CC.CC:59469 (elapsed time 0.000424 sec)
[core] Kernelizing media stream: ZZ.ZZ.ZZ.ZZ:42590 -> YY.YY.YY.YY:36662 | YY.YY.YY.YY:51562 -> XX.XX.XX.XX:60064
[core] Kernelizing media stream: XX.XX.XX.XX:60064 -> YY.YY.YY.YY:51562 | YY.YY.YY.YY:36662 -> ZZ.ZZ.ZZ.ZZ:42590
[control] Received command 'answer' from CC.CC.CC.CC:59469
[core] Removing media stream from kernel: local YY.YY.YY.YY:51562 (dialogue signalling event)
[core] Removing media stream from kernel: local YY.YY.YY.YY:36662 (dialogue signalling event)
[control] Replying to 'answer' from CC.CC.CC.CC:59469 (elapsed time 0.002543 sec)
[control] Received command 'answer' from CC.CC.CC.CC:41620
[control] Replying to 'answer' from CC.CC.CC.CC:41620 (elapsed time 0.002230 sec)
[core] Kernelizing media stream: ZZ.ZZ.ZZ.ZZ:42590 -> YY.YY.YY.YY:11102 | YY.YY.YY.YY:51562 -> XX.XX.XX.XX:60066
[core] Drop due to strict-source attribute; got XX.XX.XX.XX:60066, expected XX.XX.XX.XX:60064
[core] Removing media stream from kernel: local YY.YY.YY.YY:11102 (SSRC changed)
[core] Kernelizing media stream: ZZ.ZZ.ZZ.ZZ:42590 -> YY.YY.YY.YY:11102 | YY.YY.YY.YY:51562 -> XX.XX.XX.XX:60066
[core] RTP packet with unknown payload type 13 received from XX.XX.XX.XX:60066
[control] Received command 'delete' from CC.CC.CC.CC:57964
[core] Deleting call branch 'v4q45uec-CC-1065-OFC-1773' (via-branch '')
[core] Removing media stream from kernel: local YY.YY.YY.YY:11102 (destroying monologue)
[core] Deleting call branch 'IvZRlCvhDRqecV.Gu30DFchfTwrb5Iwl' (via-branch 'main0')
[core] Deleting entire call
[core] Final packet stats:
[core] --- Tag 'v4q45uec-CC-1065-OFC-1773', created 0:20 ago for branch ''
[core] ---     subscribed to 'IvZRlCvhDRqecV.Gu30DFchfTwrb5Iwl'
[core] ---     subscription for 'IvZRlCvhDRqecV.Gu30DFchfTwrb5Iwl'
[core] ------ Media #1 (audio over RTP/AVP) using PCMA/8000
[core] --------- Port   YY.YY.YY.YY:11102 <>   ZZ.ZZ.ZZ.ZZ:42590, SSRC 5a9a0b19, in 466 p, 80152 b, 0 e, 0 ts, out 8 p, 185 b, 0 e
[core] --------- Port   YY.YY.YY.YY:11103 <>   ZZ.ZZ.ZZ.ZZ:42591 (RTCP), SSRC 0, in 0 p, 0 b, 0 e, 11 ts, out 0 p, 0 b, 0 e
[core] --- Tag 'IvZRlCvhDRqecV.Gu30DFchfTwrb5Iwl', created 0:20 ago for branch 'main0'
[core] ---     subscribed to 'v4q45uec-CC-1065-OFC-1773'
[core] ---     subscription for 'v4q45uec-CC-1065-OFC-1773'
[core] ------ Media #1 (audio over RTP/AVP) using PCMA/8000
[core] --------- Port   YY.YY.YY.YY:51562 <>    XX.XX.XX.XX:60066, SSRC c56f860, in 409 p, 63193 b, 446 e, 0 ts, out 461 p, 688 b, 0 e
[core] --------- Port   YY.YY.YY.YY:51563 <>    XX.XX.XX.XX:60067 (RTCP), SSRC 0, in 0 p, 0 b, 0 e, 10 ts, out 0 p, 0 b, 0 e
[control] Replying to 'delete' from CC.CC.CC.CC:57964 (elapsed time 0.001367 sec)

@pkuzak
Copy link
Contributor

pkuzak commented Jun 16, 2023

Seems to relate to this discussion with @emvondo on google groups. As @smititelu noted we found a possible fix for this by removing the asymmetric flag, but we wanted to open this just to make sure the code works like intended and does not break anything by coincidence.
In the logs you can see that the new stream gets kernelized with the new port, but still that "Drop due to strict-source" log appears.

@emvondo
Copy link
Contributor

emvondo commented Jun 16, 2023

Seems to relate to this discussion with @emvondo on google groups. As @smititelu noted we found a possible fix for this by removing the asymmetric flag, but we wanted to open this just to make sure the code works like intended and does not break anything by coincidence. In the logs you can see that the new stream gets kernelized with the new port, but still that "Drop due to strict-source" log appears.

yes i wrote an update for rtperngine here for asymmetric flag with strict-source but endpoint learning does not apply for asymmetric currently. so probably need to add a case to handle it.

@rfuchs
Copy link
Member

rfuchs commented Jun 16, 2023

Are you saying that the port change from the SDP is not being honoured for source checking purposes?

@pkuzak
Copy link
Contributor

pkuzak commented Jun 16, 2023

If this question is addressed to us, I'd say yes, this is what we observe and this is how we interpret the call flow and the logs.

@rfuchs
Copy link
Member

rfuchs commented Jun 16, 2023

Ok, I can see how 900d2be would break that.

It changes the source check to check against the learned endpoint address instead of the destination endpoint address when the stream is asymmetric (why? @emvondo )

and then sets the learned endpoint address to the source address of a received packet when the stream is asymmetric, but only once (if learned address is empty) (why? @emvondo )

@emvondo
Copy link
Contributor

emvondo commented Jun 16, 2023

Ok, I can see how 900d2be would break that.

It changes the source check to check against the learned endpoint address instead of the destination endpoint address when the stream is asymmetric (why? @emvondo )

and then sets the learned endpoint address to the source address of a received packet when the stream is asymmetric, but only once (if learned address is empty) (why? @emvondo )

in asymmetric mode destination and source address are not the same. so its the source address of the packet that we check for the strict-source flag.

but because by default endpoint (SDP endpoint) is checked for asymmetric too. that check will always fail and no packets will ever be relayed if source packet ip is different from the one in SDP. Thats why we need to save the first ip of received packet for asymmetric mode in the learned_endpoint to make the check further.

Now the point is for asymmetric mode there is no way currently to ask to reset or re-learn the ip saved in learned_endpoint as it is for symmetric mode.

learned_endpoint is never set for asymmetric mode so i used it to store the ip of first packet in asymmetric mode and then make the comparison.

now thats true may be the comparison done once is not the ideal case. thats also the point i raised in the mailing list to have a way to reset it like with symmetric mode.

@rfuchs
Copy link
Member

rfuchs commented Jun 16, 2023

If you want to use asymmetric and source checking and endpoint learning all together, then I suppose it would be best to retain the normal endpoint learning mechanism even with asymmetric set (i.e. not immediately set the "confirmed" flag, as it now happens in https://github.com/sipwise/rtpengine/blob/master/daemon/media_socket.c#L2355), and then just not use the learned address as destination address.

@pkuzak
Copy link
Contributor

pkuzak commented Jun 16, 2023

I kind of understand @emvondo thoughts and arguments and I don't know how a correct implementation for this should look like. The SDP just carries the address where RTP is expected to be received, not where it will come from. So symmetric just seems to be a subset or special case of asymmetric, where RTP simply does come from the same address it is expected to be received.
We actually don't have asymmetric media in our network. In our case, the use of the asymmetric flag has historical reasons. Back the when we introduced rtpengine into our platform, performance was an important topic. I remember we wanted to disable the learning phase as it was happening in user space and we wanted the stream to be kernelized as soon as possible (ideally immediately). I consulted @rfuchs and I was told to use asymmetric flag to achieve this.
I think meanwhile the implementation has changed and I have the impression streams get kernelized fast, even without the asymmetric flag (please correct me @rfuchs if I am wrong). So for us, it is just reasonable to remove the asymmetric flag to solve this issue.
As @smititelu wrote in the description, we mainly opened this issue to document our finding and to ask whether the behavior is expected like it is now or not. We don't rate this behavior whether it is correct or not.

@emvondo
Copy link
Contributor

emvondo commented Jun 16, 2023

If you want to use asymmetric and source checking and endpoint learning all together, then I suppose it would be best to retain the normal endpoint learning mechanism even with asymmetric set (i.e. not immediately set the "confirmed" flag, as it now happens in https://github.com/sipwise/rtpengine/blob/master/daemon/media_socket.c#L2355), and then just not use the learned address as destination address.

hum OK i see. it could be a start of solution because right now i dont see the right way to implement it because in our call flows we have asymmetric RTP and in the same time we also want in asymmetric mode to reject incoming streams that dont match the first source ip catched. But its a dilemma because even with this asymmetric mode you dont know wether the first packet you received comes from authorized source or not so. i dont think there is an ultime solution for that.

when there is no forking its ok. but with forked calls that issue appears.

so ->el_flags is reset everytime you call rtpengien_offer') or rtpengine_answer() ?

@emvondo
Copy link
Contributor

emvondo commented Jun 16, 2023

Just want to ask @rfuchs there is no flag to reset asymmetric and strict source flags through rtpengine_offer() and rtpengine_answer() ? or may be its the same answer you provided me on the mailing list

@rfuchs
Copy link
Member

rfuchs commented Jun 16, 2023

As @smititelu wrote in the description, we mainly opened this issue to document our finding and to ask whether the behavior is expected like it is now or not. We don't rate this behavior whether it is correct or not.

Honestly I'm not sure what the expected behaviour ought to be because this combination (asymmetric + strict-source) is a very odd use case.

it could be a start of solution because right now i dont see the right way to implement it because in our call flows we have asymmetric RTP and in the same time we also want in asymmetric mode to reject incoming streams that dont match the first source ip catched.

That's why I think that perhaps retaining the usual endpoint learning mechanism even with asymmetric streams could be the way forward. This mechanism uses a delay of ~2 seconds after a signalling event and then locks in to a source address only after that delay. But of course that would negate @smititelu's use case of wanting to achieve an immediate push of the stream to the kernel module, so that should be done only when source checking is also wanted (or perhaps using an extra flag).

(There is possibility to improve the usual endpoint learning mechanism for all regular cases here. The stream could be pushed to the kernel immediately and still use a delay of a few seconds to perform endpoint learning at the same time. But this needs to be implemented and can be tricky to get right.)

Just want to ask @rfuchs there is no flag to reset asymmetric and strict source flags through rtpengine_offer() and rtpengine_answer() ? or may be its the same answer you provided me on the mailing list

Yes it is the same answer. There is no such flag. Endpoint learning is normally always re-enabled after a signalling event (that's what the CONFIRMED flag is for, which gets reset during an offer and an answer). So AFAIC the same thing should happen with asymmetric streams when endpoint learning should still be done for source checking.

@emvondo
Copy link
Contributor

emvondo commented Jun 19, 2023

Honestly I'm not sure what the expected behaviour ought to be because this combination (asymmetric + strict-source) is a very odd use case.

exactly . its rather a question of setting at this point. if you know you need it you activate it knowing rtp source ip will never change afterward

That's why I think that perhaps retaining the usual endpoint learning mechanism even with asymmetric streams could be the way forward. This mechanism uses a delay of ~2 seconds after a signalling event and then locks in to a source address only after that delay.

yes im also aligned with it. do you see a benefit or penalty to make that delay time dynamic (with a flag passed through offer & answer commands) instead of having it statically configured like today ?

(that's what the CONFIRMED flag is for, which gets reset during an offer and an answer)

ok thanks did not know that. i will make somes changes on my local copy to test and push a merge request concerning delay for asymmetric RTP.

@emvondo
Copy link
Contributor

emvondo commented Jun 19, 2023

(that's what the CONFIRMED flag is for, which gets reset during an offer and an answer)

@rfuchs it seems not cleared for ASYMMERIC currently isnt it ? https://github.com/sipwise/rtpengine/blob/master/daemon/media_socket.c#L1883

@pkuzak
Copy link
Contributor

pkuzak commented Jun 19, 2023

@emvondo and @rfuchs , regarding the time delay and the kernelization: In my opinion, with the flags "trust-address" and "symmetric" (which seems to be default so you don't need to explicitly pass them) rtpengine should kernelize the stream immediately as it already has all information required to kernelize the stream. No endpoint learning is required.

@pkuzak
Copy link
Contributor

pkuzak commented Jun 19, 2023

Also we experience segmentation faults and core dumps in a line touched by this commit:

rtpengine logs:

INFO: [lJDOBnnJR3.3xZoWny8ECYA0YTMGgbqo]: [control] Received command 'offer' from CC.CC.CC.CC:51548
INFO: [lJDOBnnJR3.3xZoWny8ECYA0YTMGgbqo]: [control] Replying to 'offer' from CC.CC.CC.CC:51548 (elapsed time 0.001356 sec)
INFO: [lJDOBnnJR3.3xZoWny8ECYA0YTMGgbqo]: [control] Received command 'answer' from CC.CC.CC.CC:49248
INFO: [lJDOBnnJR3.3xZoWny8ECYA0YTMGgbqo]: [control] Replying to 'answer' from CC.CC.CC.CC:49248 (elapsed time 0.002557 sec)
INFO: [lJDOBnnJR3.3xZoWny8ECYA0YTMGgbqo/xxrrsmj5-CC-1063-OFC-1682/1 port 55036]: [core] Kernelizing media stream: ZZ.ZZ.ZZ.ZZ:42408 -> YY.YY.YY.YY:55036 | YY.YY.YY.YY:16136 -> XX.XX.XX.XX:7088
INFO: [lJDOBnnJR3.3xZoWny8ECYA0YTMGgbqo/iMYm7cDu-oZBsvszlBN2DhDD4NPOC5Y1/1 port 16136]: [ice] ICE negotiated: peer for component 1 is XX.XX.XX.XX:7088
INFO: [lJDOBnnJR3.3xZoWny8ECYA0YTMGgbqo/iMYm7cDu-oZBsvszlBN2DhDD4NPOC5Y1/1 port 16136]: [ice] ICE negotiated: local interface YY.YY.YY.YY
INFO: [lJDOBnnJR3.3xZoWny8ECYA0YTMGgbqo/iMYm7cDu-oZBsvszlBN2DhDD4NPOC5Y1/1 port 16136]: [ice] ICE negotiated: peer for component 2 is XX.XX.XX.XX:7089
INFO: [lJDOBnnJR3.3xZoWny8ECYA0YTMGgbqo/iMYm7cDu-oZBsvszlBN2DhDD4NPOC5Y1/1 port 16136]: [core] Removing media stream from kernel: local YY.YY.YY.YY:55036 (ICE negotiation event)

Back trace:

(gdb) bt
#0 0x00005629befae0c9 in __re_address_translate_ep (ep=0x7fc79c136810, o=0x7fc85052c818) at media_socket.c:1332
#1 kernelize_one (reti=reti@entry=0x7fc85052c800, outputs=outputs@entry=0x7fc85052c7e0, stream=stream@entry=0x7fc79c136600, sink_handler=0x7fc64bc62e60, sinks=sinks@entry=0x7fc79c1366f0,
payload_types=payload_types@entry=0x7fc85052c7d8) at media_socket.c:1418
#2 0x00005629befae5db in kernelize (stream=0x7fc79c136600) at media_socket.c:1626
#3 0x00005629befaea0b in media_packet_kernel_check (phc=phc@entry=0x7fc85052cd90) at media_socket.c:2486
#4 0x00005629befaeeb2 in stream_packet (phc=phc@entry=0x7fc85052cd90) at media_socket.c:2652
#5 0x00005629befb1472 in stream_fd_readable (fd=12552, p=0x7fc7e42144f0, u=) at media_socket.c:3029
#6 0x00005629bef4d4e8 in poller_poll (p=0x5629c013b770, timeout=) at poller.c:440
#7 0x00005629bef4dc9d in poller_loop2 (d=0x5629c013b770) at poller.c:619
#8 0x00005629bef4de6c in thread_detach_func (d=0x5629c0a54640) at aux.c:252
#9 0x00007fc85bb2dea7 in start_thread (arg=) at pthread_create.c:477
#10 0x00007fc85ba4da2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

We do not support ICE in our platform and we don't know why the client is trying ICE as we use ICE=remove in all offer/answer requests to rtpengine. Nonetheless it should not crash. Currently it kind of crashes consistently when that client calls, always (4x since yesterday) with the same back trace.

Also I am not sure if the crash is caused by this commit or it would also crash without this commit. Just added it here cause it at least is related to that commit.

We are running a release based on commit 177b411.

@rfuchs
Copy link
Member

rfuchs commented Jun 19, 2023

@emvondo and @rfuchs , regarding the time delay and the kernelization: In my opinion, with the flags "trust-address" and "symmetric" (which seems to be default so you don't need to explicitly pass them) rtpengine should kernelize the stream immediately as it already has all information required to kernelize the stream. No endpoint learning is required.

trust address does not negate endpoint learning. It simply means that the address from the SDP should be used as initial endpoint address and not the source address of the SIP message.

You can use endpoint learning=off to disable endpoint learning, which has a similar effect as setting asymmetric

(that's what the CONFIRMED flag is for, which gets reset during an offer and an answer)

@rfuchs it seems not cleared for ASYMMERIC currently isnt it ? https://github.com/sipwise/rtpengine/blob/master/daemon/media_socket.c#L1883

Right, because the combination of asymmetric + endpoint learning was never considered.

@pkuzak
Copy link
Contributor

pkuzak commented Jun 19, 2023

Related to the flags and the Kamailio rtpengine module documentation: What do you think about removing the possible flags from that module documentation and instead give a link to corresponding part of rtpengine project documentation? I think this is a good idea as basically rtpengine defines what flags it supports and it is simpler to maintain one documentation.

@rfuchs
Copy link
Member

rfuchs commented Jun 19, 2023

Related to the flags and the Kamailio rtpengine module documentation: What do you think about removing the possible flags from that module documentation and instead give a link to corresponding part of rtpengine project documentation? I think this is a good idea as basically rtpengine defines what flags it supports and it is simpler to maintain one documentation.

Yeah can probably do that. The rtpengine docs are already linked AFAIK, but can probably remove the description of most of the flags and just link to the docs. Some basic usage examples should remain though.

(The flags handling in the module need a serious overhaul anyway, but that's a topic for another day)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants