Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoAP/LWM2M: Clean Packet Retransmission Concept #28117

Closed
broglep-work opened this issue Sep 7, 2020 · 2 comments · Fixed by #31406
Closed

CoAP/LWM2M: Clean Packet Retransmission Concept #28117

broglep-work opened this issue Sep 7, 2020 · 2 comments · Fixed by #31406
Assignees
Labels
area: Networking Enhancement Changes/Updates/Additions to existing features

Comments

@broglep-work
Copy link
Contributor

broglep-work commented Sep 7, 2020

Summary
Currently CoAP packet retransmission is not explicitly modelled in the code base. In the CoAP layer the logic about retransmission is done with next_timeout and coap_pending_cycle. A second retransmission concept is only partially implemented in LWM2M layer. There is a need for clean packet retransmission handling within & across layers.

Details
next_timeout doubles the previous timeout 3 times, after that it signal end of retransmission by returning the same timeout as before. This implementation rather hides how retransmissions are done behind timeout generation. Further the current approach makes it difficult to fix the already existing TODO about random generated initial ACK timeout. In the LWM2M layer there is a send_attempts for lwm2m_message, but it not used actually (just being increment at the moment, never read)

Preferred Solution

  • CoAP Layer: the number of times a packet has been sent is tracked in coap_pending struct. next_timeout uses the number of transmission for its calculation of timeout. coap_pending_cycle should ideally only handle timeout, decision to stop retransmission happens on a higher level (in CoAP server and LWM2M engine). If coap_pending_cycle should also handle stop of retransmissions, it should use the number of times a packet is sent to return if transmission should be stopped.

  • LWM2M Layer: LWM2M Engine properly uses it own re-transmission on top of CoAP. e.g. let CoAP re-transmit 3 times, if it fails, re-transmit lwm2m_message once. Or if not deemed usefully, completely remote the LWM2M message re-transmission concept (remove send_attemps from lwm2m_message)

Motivation
A clean retransmission concept across layers facilitates understanding of the code base and how retransmissions works in the zephyr client. Further it helps implementing more advanced features like custom retransmission schemes for different LWM2M messages (e.g. bootstrap requests)

@broglep-work broglep-work added the Enhancement Changes/Updates/Additions to existing features label Sep 7, 2020
@rlubos
Copy link
Contributor

rlubos commented Sep 10, 2020

I agree that the retransmission logic in CoAP could be more explicit. Especially, that currently the retransmission count is hardcoded (in the aforementioned next_timeout() function) and with a proper retransmission counter it could be made configurable (CoAP spec does not put a hard requirement on the MAX_RETRANSMIT value).

LWM2M Engine properly uses it own re-transmission on top of CoAP

Could you clarify what did you mean? According to LwM2M spec, it does not specify any retransmission mechanism, other than the one at the CoAP level (please correct me if I'm wrong).

In general, my feeling is that Zephyr would benefit from an additional CoAP layer, built on top of current APIs, that would take care of the mechanics defined in the CoAP RFC (CON retransmission, request/response matching, message deduplication). Currently, the burden of implementing these specific CoAP features is on the application (or the LwM2M lib), while the implementation could be common for all CoAP users.

@broglep-work
Copy link
Contributor Author

LWM2M Engine properly uses it own re-transmission on top of CoAP

Could you clarify what did you mean? According to LwM2M spec, it does not specify any retransmission mechanism, other than the one at the CoAP level (please correct me if I'm wrong).

While the LwM2M spec does not specify any retransmission, it could be useful if LwM2M lib has its own mechanism for re-transmission if the underlying CoAP transmission does not work. Something like this is present in the bootstrapping client, it creates one lwm2m message, if the corresponding CoAP packet transmission fails, it restart the bootstrap process (by restarting the whole lwm2m engine) and creates a new lwm2m message. You could also have this without an engine restart where one could configure to retry sending a lwm2m message. I assume send_attempts was meant for this. But if the CoAP retransmission is flexible enough, there is no need for having a retransmission concept a level higher on LwM2M

rlubos added a commit to rlubos/zephyr that referenced this issue Jan 18, 2021
Introduce retransmission counter to the coap_pending structure. This
allows to simplify the retransmission logic and allows to keep track of
the number of remaining retranmissions.

Additionally, extend the `coap_pending_init()` function with `retries`
parameter, which allows to set the retransmission count individually for
each confirmable transaction.

Fixes zephyrproject-rtos#28117

Signed-off-by: Robert Lubos <[email protected]>
nashif pushed a commit that referenced this issue Jan 19, 2021
Introduce retransmission counter to the coap_pending structure. This
allows to simplify the retransmission logic and allows to keep track of
the number of remaining retranmissions.

Additionally, extend the `coap_pending_init()` function with `retries`
parameter, which allows to set the retransmission count individually for
each confirmable transaction.

Fixes #28117

Signed-off-by: Robert Lubos <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Networking Enhancement Changes/Updates/Additions to existing features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants