Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

communication: cleans up CoAP timeout/parameters/retransmission stuff #1775

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

avtolstoy
Copy link
Member

@avtolstoy avtolstoy commented May 14, 2019

POC. DO NOT MERGE AS-IS

Description

This is something I made some time ago while investigating CoAP issues for Gen 3 devices, so putting it here so it's not lost in time.

This PR:

  1. Cleans up (IMHO) CoAP parameter calculations
  2. Fixes retransmission logic to exactly follow RFC (random part is calculated only once), additional logging, keeping retransmissions within MAX_TRANSMIT_SPAN
  3. MAX_TRANSMIT_SPAN -> EXCHANGE_LIFETIME again to follow the RFC
  4. Makes PING ACK timeout based on MAX_TRANSMIT_SPAN instead of arbitrary 30s



CoAPMessage(message_id_t id_) : next(nullptr), timeout(0), id(id_), transmit_count(0), delivered(nullptr), data_len(0) {
CoAPMessage(message_id_t id_) :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be sure the clang_format file has these formatting changes.

Copy link
Contributor

@m-mcgowan m-mcgowan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for going back and cleaning up. On the whole this is a positive change, but I am concerned about some of the very long timeouts, especially since the server does not match these timeouts so they are in practice redundant while causing increase memory usage.

timeout = now + transmit_timeout(transmit_count);
transmit_count++;
return transmit_count <= MAX_RETRANSMIT+1;
if (coapType == CoAPType::CON) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this technically is closer to the spec (taking the timeout with a pre-applied random factor and doubling it vs doubling the timeout and then applying a random factor), the difference in practice is not significant, especially considering the device service uses an entirely different set of parameters. Both approaches are within the operational constraints of how long the server will wait since the maximum timeout ends up being the same.

I'd like to drive this change from observed problems rather than from a strict adherence to the spec since this is a non-trivial change, so I want to be sure the gain outweighs the risk.

In terms of gain, I appreciate the inline references to the spec, which adds clarity. Also welcome other voices here about which is easiest to comprehend since that is a key factor here too. I am on the fence, but can imagine others might find this easier to grasp.

// IMPORTANT: the second argument is the timeout waiting for the ping
// acknowledgment. This SHOULD be set to MAX_TRANSMIT_WAIT, which is 93s
// for default transmission parameters defined in RFC7252. If required we might lower it.
initialize_ping(23 * 60 * 1000, CoAPMessage::MAX_TRANSMIT_WAIT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bravo, I never liked the arbitrary choice of 30s. Ideally these magic values and the CoAP retries and timeouts should be parametrizable from the system layer.

@@ -150,7 +151,7 @@ ProtocolError CoAPMessageStore::receive(Message& msg, Channel& channel, system_t
return INSUFFICIENT_STORAGE;
// the timeout here is ideally purely academic since the application will respond immediately with an ACK/RESET
// which will be stored in place of this message, with it's own timeout.
coapmsg->set_expiration(time+CoAPMessage::MAX_TRANSMIT_SPAN);
coapmsg->set_expiration(time + CoAPMessage::EXCHANGE_LIFETIME);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exchange lifetime is much larger than the max transmit span, meaning we will have to keep many more messages in memory so that their responses can be resent on retransmission from the server. Just something to thing about - we should ideally run memory profiling over a busy CoAP connection with very slow acknowledgements and see how much memory is required.

Expecting an application to wait some 300 seconds (5 minutes) for a response seems unrealistic. Reducing the MAX_LATENCY from 100s to 5-10s seems reasonable given our use case. and would reduce the EXCHANGE_LIFETIME to something more manageable.

Going forward, I propose that we aim for shorter timeouts and more frequent retransmits, which we can have as a negotiation step between the device and the server. That way we can tailor the timeouts to the network characteristics, data cost, and the application needs.

// RFC7252 4.8.2. Time Values Derived from Transmission Parameters
static constexpr const auto MAX_TRANSMIT_SPAN = (ACK_TIMEOUT * ((1 << MAX_RETRANSMIT) - 1) * ACK_RANDOM_FACTOR) / ACK_RANDOM_DIVISOR;
static constexpr const auto MAX_TRANSMIT_WAIT = (ACK_TIMEOUT * ((1 << (MAX_RETRANSMIT + 1)) - 1) * ACK_RANDOM_FACTOR) / ACK_RANDOM_DIVISOR;
static constexpr const auto MAX_LATENCY = 100000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100s is not realistic for our use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants