communication: cleans up CoAP timeout/parameters/retransmission stuff #1775

avtolstoy · 2019-05-14T09:14:16Z

POC. DO NOT MERGE AS-IS

Description

This is something I made some time ago while investigating CoAP issues for Gen 3 devices, so putting it here so it's not lost in time.

This PR:

Cleans up (IMHO) CoAP parameter calculations
Fixes retransmission logic to exactly follow RFC (random part is calculated only once), additional logging, keeping retransmissions within MAX_TRANSMIT_SPAN
MAX_TRANSMIT_SPAN -> EXCHANGE_LIFETIME again to follow the RFC
Makes PING ACK timeout based on MAX_TRANSMIT_SPAN instead of arbitrary 30s

m-mcgowan · 2019-05-23T12:17:51Z

communication/src/coap_channel.h



-	CoAPMessage(message_id_t id_) : next(nullptr), timeout(0), id(id_), transmit_count(0), delivered(nullptr), data_len(0) {
+	CoAPMessage(message_id_t id_) :


Please be sure the clang_format file has these formatting changes.

m-mcgowan

Thanks for going back and cleaning up. On the whole this is a positive change, but I am concerned about some of the very long timeouts, especially since the server does not match these timeouts so they are in practice redundant while causing increase memory usage.

m-mcgowan · 2019-05-23T12:21:31Z

communication/src/coap_channel.h

-			timeout = now + transmit_timeout(transmit_count);
-			transmit_count++;
-			return transmit_count <= MAX_RETRANSMIT+1;
+		if (coapType == CoAPType::CON) {


While this technically is closer to the spec (taking the timeout with a pre-applied random factor and doubling it vs doubling the timeout and then applying a random factor), the difference in practice is not significant, especially considering the device service uses an entirely different set of parameters. Both approaches are within the operational constraints of how long the server will wait since the maximum timeout ends up being the same.

I'd like to drive this change from observed problems rather than from a strict adherence to the spec since this is a non-trivial change, so I want to be sure the gain outweighs the risk.

In terms of gain, I appreciate the inline references to the spec, which adds clarity. Also welcome other voices here about which is easiest to comprehend since that is a key factor here too. I am on the fence, but can imagine others might find this easier to grasp.

m-mcgowan · 2019-05-23T12:22:47Z

communication/src/dtls_protocol.cpp

+	// IMPORTANT: the second argument is the timeout waiting for the ping
+	// acknowledgment. This SHOULD be set to MAX_TRANSMIT_WAIT, which is 93s
+	// for default transmission parameters defined in RFC7252. If required we might lower it.
+	initialize_ping(23 * 60 * 1000, CoAPMessage::MAX_TRANSMIT_WAIT);


Bravo, I never liked the arbitrary choice of 30s. Ideally these magic values and the CoAP retries and timeouts should be parametrizable from the system layer.

m-mcgowan · 2019-05-23T12:33:15Z

communication/src/coap_channel.cpp

@@ -150,7 +151,7 @@ ProtocolError CoAPMessageStore::receive(Message& msg, Channel& channel, system_t
 				return INSUFFICIENT_STORAGE;
 			// the timeout here is ideally purely academic since the application will respond immediately with an ACK/RESET
 			// which will be stored in place of this message, with it's own timeout.
-			coapmsg->set_expiration(time+CoAPMessage::MAX_TRANSMIT_SPAN);
+			coapmsg->set_expiration(time + CoAPMessage::EXCHANGE_LIFETIME);


The exchange lifetime is much larger than the max transmit span, meaning we will have to keep many more messages in memory so that their responses can be resent on retransmission from the server. Just something to thing about - we should ideally run memory profiling over a busy CoAP connection with very slow acknowledgements and see how much memory is required.

Expecting an application to wait some 300 seconds (5 minutes) for a response seems unrealistic. Reducing the MAX_LATENCY from 100s to 5-10s seems reasonable given our use case. and would reduce the EXCHANGE_LIFETIME to something more manageable.

Going forward, I propose that we aim for shorter timeouts and more frequent retransmits, which we can have as a negotiation step between the device and the server. That way we can tailor the timeouts to the network characteristics, data cost, and the application needs.

m-mcgowan · 2019-05-23T12:39:37Z

communication/src/coap_channel.h

+	// RFC7252 4.8.2. Time Values Derived from Transmission Parameters
+	static constexpr const auto MAX_TRANSMIT_SPAN = (ACK_TIMEOUT * ((1 << MAX_RETRANSMIT) - 1) * ACK_RANDOM_FACTOR) / ACK_RANDOM_DIVISOR;
+	static constexpr const auto MAX_TRANSMIT_WAIT = (ACK_TIMEOUT * ((1 << (MAX_RETRANSMIT + 1)) - 1) * ACK_RANDOM_FACTOR) / ACK_RANDOM_DIVISOR;
+	static constexpr const auto MAX_LATENCY = 100000;


100s is not realistic for our use case.

communication: cleans up CoAP timeout/parameters/retransmission stuff

b771f12

avtolstoy added the DO NOT MERGE label May 14, 2019

avtolstoy requested a review from m-mcgowan May 14, 2019 09:14

m-mcgowan reviewed May 23, 2019

View reviewed changes

avtolstoy force-pushed the develop branch from c253104 to 55f59c1 Compare June 3, 2020 19:02

avtolstoy force-pushed the develop branch from 77b2534 to 5c09a50 Compare December 9, 2020 22:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

communication: cleans up CoAP timeout/parameters/retransmission stuff #1775

communication: cleans up CoAP timeout/parameters/retransmission stuff #1775

avtolstoy commented May 14, 2019 •

edited

Loading

m-mcgowan May 23, 2019

m-mcgowan left a comment

m-mcgowan May 23, 2019

m-mcgowan May 23, 2019

m-mcgowan May 23, 2019

m-mcgowan May 23, 2019



		CoAPMessage(message_id_t id_) : next(nullptr), timeout(0), id(id_), transmit_count(0), delivered(nullptr), data_len(0) {
		CoAPMessage(message_id_t id_) :

communication: cleans up CoAP timeout/parameters/retransmission stuff #1775

Are you sure you want to change the base?

communication: cleans up CoAP timeout/parameters/retransmission stuff #1775

Conversation

avtolstoy commented May 14, 2019 • edited Loading

POC. DO NOT MERGE AS-IS

Description

m-mcgowan May 23, 2019

Choose a reason for hiding this comment

m-mcgowan left a comment

Choose a reason for hiding this comment

m-mcgowan May 23, 2019

Choose a reason for hiding this comment

m-mcgowan May 23, 2019

Choose a reason for hiding this comment

m-mcgowan May 23, 2019

Choose a reason for hiding this comment

m-mcgowan May 23, 2019

Choose a reason for hiding this comment

avtolstoy commented May 14, 2019 •

edited

Loading