-
Notifications
You must be signed in to change notification settings - Fork 518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
communication: cleans up CoAP timeout/parameters/retransmission stuff #1775
base: develop
Are you sure you want to change the base?
Conversation
|
||
|
||
CoAPMessage(message_id_t id_) : next(nullptr), timeout(0), id(id_), transmit_count(0), delivered(nullptr), data_len(0) { | ||
CoAPMessage(message_id_t id_) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please be sure the clang_format file has these formatting changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for going back and cleaning up. On the whole this is a positive change, but I am concerned about some of the very long timeouts, especially since the server does not match these timeouts so they are in practice redundant while causing increase memory usage.
timeout = now + transmit_timeout(transmit_count); | ||
transmit_count++; | ||
return transmit_count <= MAX_RETRANSMIT+1; | ||
if (coapType == CoAPType::CON) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this technically is closer to the spec (taking the timeout with a pre-applied random factor and doubling it vs doubling the timeout and then applying a random factor), the difference in practice is not significant, especially considering the device service uses an entirely different set of parameters. Both approaches are within the operational constraints of how long the server will wait since the maximum timeout ends up being the same.
I'd like to drive this change from observed problems rather than from a strict adherence to the spec since this is a non-trivial change, so I want to be sure the gain outweighs the risk.
In terms of gain, I appreciate the inline references to the spec, which adds clarity. Also welcome other voices here about which is easiest to comprehend since that is a key factor here too. I am on the fence, but can imagine others might find this easier to grasp.
// IMPORTANT: the second argument is the timeout waiting for the ping | ||
// acknowledgment. This SHOULD be set to MAX_TRANSMIT_WAIT, which is 93s | ||
// for default transmission parameters defined in RFC7252. If required we might lower it. | ||
initialize_ping(23 * 60 * 1000, CoAPMessage::MAX_TRANSMIT_WAIT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bravo, I never liked the arbitrary choice of 30s. Ideally these magic values and the CoAP retries and timeouts should be parametrizable from the system layer.
@@ -150,7 +151,7 @@ ProtocolError CoAPMessageStore::receive(Message& msg, Channel& channel, system_t | |||
return INSUFFICIENT_STORAGE; | |||
// the timeout here is ideally purely academic since the application will respond immediately with an ACK/RESET | |||
// which will be stored in place of this message, with it's own timeout. | |||
coapmsg->set_expiration(time+CoAPMessage::MAX_TRANSMIT_SPAN); | |||
coapmsg->set_expiration(time + CoAPMessage::EXCHANGE_LIFETIME); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exchange lifetime is much larger than the max transmit span, meaning we will have to keep many more messages in memory so that their responses can be resent on retransmission from the server. Just something to thing about - we should ideally run memory profiling over a busy CoAP connection with very slow acknowledgements and see how much memory is required.
Expecting an application to wait some 300 seconds (5 minutes) for a response seems unrealistic. Reducing the MAX_LATENCY from 100s to 5-10s seems reasonable given our use case. and would reduce the EXCHANGE_LIFETIME to something more manageable.
Going forward, I propose that we aim for shorter timeouts and more frequent retransmits, which we can have as a negotiation step between the device and the server. That way we can tailor the timeouts to the network characteristics, data cost, and the application needs.
// RFC7252 4.8.2. Time Values Derived from Transmission Parameters | ||
static constexpr const auto MAX_TRANSMIT_SPAN = (ACK_TIMEOUT * ((1 << MAX_RETRANSMIT) - 1) * ACK_RANDOM_FACTOR) / ACK_RANDOM_DIVISOR; | ||
static constexpr const auto MAX_TRANSMIT_WAIT = (ACK_TIMEOUT * ((1 << (MAX_RETRANSMIT + 1)) - 1) * ACK_RANDOM_FACTOR) / ACK_RANDOM_DIVISOR; | ||
static constexpr const auto MAX_LATENCY = 100000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100s is not realistic for our use case.
POC. DO NOT MERGE AS-IS
Description
This is something I made some time ago while investigating CoAP issues for Gen 3 devices, so putting it here so it's not lost in time.
This PR:
MAX_TRANSMIT_SPAN
MAX_TRANSMIT_SPAN
->EXCHANGE_LIFETIME
again to follow the RFCPING
ACK timeout based onMAX_TRANSMIT_SPAN
instead of arbitrary30s