net/gcoap: add functions to read and set the next message id #16730

janosbrodbeck · 2021-08-13T16:12:15Z

Contribution description

This PR adds two new functions to gcoap, which allow to read the next message ID and to set it to a given value.
Currently, the next message ID is randomly generated while init, but that is not always wanted. I need to set it manually.

I intentionally named the function "read" to make it indirectly clear that this is not part of the regular use-case of gcoap. Well, I hope it does that with this naming :D

Testing procedure

Apply this small patch for the gcoap example:

diff --git a/examples/gcoap/gcoap_cli.c b/examples/gcoap/gcoap_cli.c
index f089a65c7b..ab3080b947 100644
--- a/examples/gcoap/gcoap_cli.c
+++ b/examples/gcoap/gcoap_cli.c
@@ -328,6 +328,7 @@ int gcoap_cli_cmd(int argc, char **argv)
 #endif
         printf(" CLI requests sent: %u\n", req_count);
         printf("CoAP open requests: %u\n", open_reqs);
+        printf("Next message ID: %u\n", gcoap_read_next_message_id());
         printf("Configured Proxy: ");
         if (_proxied) {
             char addrstr[IPV6_ADDR_MAX_STR_LEN];
@@ -488,6 +489,6 @@ void gcoap_cli_init(void)
         printf("gcoap: cannot add credential to DTLS sock: %d\n", res);
     }
 #endif
-
+    gcoap_set_next_message_id(10);
     gcoap_register_listener(&_listener);
 }

coap info shows the next message ID. It is set to 10 by default with this patch.
Send a request to a server (request does not need to succeed)
Next message ID should be 11

Issues/PRs references

benpicco · 2021-08-13T16:38:56Z

Under what circumstances does it make sense to control the message id? Isn't it like a sequence number that is only used internally?

chrysn · 2021-08-13T16:48:16Z

Looks sound technically. I can't follow the "read" rationale, have you considered "gcoap_peek_message_id"? (Given it's something that usually auto-increments, and I understand the accessor to not have that side effect). Just: Why would you ever need to set the next MID? After all, exposing API that's really supposed to work automatically adds pitfalls for users that might be tempted to use CoAP outside of what it's designed for, and understanding why you need it here might help shape the warning the function comes with. (Right now I'd say "just never use that outside of debugging or if you try to reproduce a particular byte pattern on the wire for testing", but your PR implies you have a case). (For cross-reference, I had to touch this around the sending of separate responses (#14395), but there I had to *take* a new MID, which would imply yet a different API and not need a setter then.)

janosbrodbeck · 2021-08-13T16:58:16Z

Under what circumstances does it make sense to control the message id? Isn't it like a sequence number that is only used internally?

When you want to maintain the next message ID between reboots or anything that reinitialize gcoap. Run into the scenario that a server maintained a larger message cache. After reboot, I got randomly a recent used message ID -> server responded with the previous response from the previous request because it had the same message ID. But then gcoap couldn't match because of a token mismatch.

Technically, this is not wrong of the server, even if it can forecast the scenario by the given token. According to the RFC, however, the token does not play a role here, only the message ID.

It may be a specific problem, and you may have to question the server here as well...but my main thought is primarily "I want to maintain the next message ID between reboots/hibernations".

benpicco · 2021-08-13T17:03:03Z

Hm but how many messages does the server cache that even with just a 16 bit message ID a collision is likely?
To me that sounds more like a case of a poor source of randomness.

Do you still have the issue if you use prng_hwrng?

chrysn · 2021-08-13T17:18:04Z

That does make sense (although I'm surprised it is hit ... EXCHANGE_LIFETIME defaults to 247 seconds, but yeah at 100 requests per second this hits 50-50 on reaching the cache after a reboot, and at 10/s it's still 4% by the back of the envelope). API-wise, I'd suggest a different approach though: gcoap could define an opaque type (OK no such thing in C, but say in the docs that the fields may change) describing its state, with a `gcoap_hibernate_dump(*type)` and a `gcoap_resume_from(const *type)` function. The type's content would depend on build flags, would always contain MIDs but possibly also token (making reboots unnoticable to those observing token values) and possibly DTLS or OSCORE resumption data later. Of course, if as Ben suggested it's all just about lack of randomness, then the same approach applies but to the RNG ;-)

benpicco · 2021-08-14T15:52:17Z

I would really prefer if we could avoid such a workaround and instead make sure the PRNG is properly seeded across reboots if no HWRNG is available. If we can store a next message ID we could as well store a next seed - and that would benefit all users of functionality.

chrysn · 2021-08-14T19:09:58Z

Janos, can you clarify whether MIDs collide due to lack of entropy or due to large message numbers?

janosbrodbeck · 2021-08-14T21:35:11Z

I've checked and the prng_hwrng module is used. I had also looked at the distribution of the generated message ID after reboots two weeks ago, and that certainly looked random to me and not pseudorandom with a seed.

The message cache also does not hold a number of messages, but for a fixed period of time (server is based on californium). This scenario occurs very, very rarely, but it did happen. We are not even necessarily talking about a high number of messages. Reducing the cache time will probably solve the problem completely (test is already scheduled for next week) for my case, but you have to choose your traffic, cache and reboot timings wisely so that you have no chance for this problem at all.

I don't consider restoring the previous message ID as a workaround. I see this as a complete solution approach, because the probability, as small as it is, is given that after a reboot you get an identical message ID to the last X. And the consequence is that the client in case of CON messages tries the request several times (depending on the number of retransmissions) without success.

I'll take another look at it, I just don't have the hardware for it right now.

chrysn · 2021-08-16T15:27:26Z

In light of that I'd consider the general issue worth pursuing, albeit with the preference that it's exposed API-wise as some kind of persisting state. It may be even worth considering to phrase the persistence more generically (like, OS-wide data to store for resumption after a shutdown), even if the only user of it is gcoap (and I'd suppose an RNG would be next), but if that's out of scope for here I'd understand and see it as a prototype for future "store this and restore later" parts that may be subsumed in an OS-wide thing.

chrysn · 2021-09-12T18:57:40Z

How shall we continue on this?

I have a pretty concrete idea already on how system-wide extensible permission with minimal wear on the flash could be done, but don't get around to implementing this now.

@janosbrodbeck, what were your plans for backing the numbers? When would you have planned to extract the numbers? Presumably on controlled shutdowns ... hbo uncontrolled ones? Would they work also if the CoAP stack (really, on long term RIOT as a whole) called some "please persist this now" event?

janosbrodbeck · 2021-11-21T00:38:08Z

@janosbrodbeck, what were your plans for backing the numbers? When would you have planned to extract the numbers? Presumably on controlled shutdowns ... hbo uncontrolled ones? Would they work also if the CoAP stack (really, on long term RIOT as a whole) called some "please persist this now" event?

Only controlled shutdowns were planned for the first time. A persist-event would imho only help, when a generic API would exist for that which then calls all supported/persist-enabled modules to persist now. That would then replace module specific persistance calls before a shutdown. If I got your idea correct.

How shall we continue on this?

I won't be working on the topic for the foreseeable future. I would close this PR, if that is okay. If someone else comes up with another idea/implementation that would be great.

net/gcoap: add functions to read and set the next message id

4376af9

janosbrodbeck requested review from cgundogan, haukepetersen, miri64 and PeterKietzmann as code owners August 13, 2021 16:12

github-actions bot added Area: CoAP Area: Constrained Application Protocol implementations Area: network Area: Networking Area: sys Area: System labels Aug 13, 2021

benpicco requested a review from chrysn August 13, 2021 16:31

chrysn mentioned this pull request Sep 14, 2021

ieee802154_security: Nonce is reused after reboot #16844

Open

janosbrodbeck closed this Nov 27, 2021

chrysn mentioned this pull request Dec 24, 2021

cpu: add flash_writable section to linker script #17436

Merged

chrysn mentioned this pull request Nov 29, 2022

nanocoap: add coap_next_msg_id() #18991

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

net/gcoap: add functions to read and set the next message id #16730

net/gcoap: add functions to read and set the next message id #16730

janosbrodbeck commented Aug 13, 2021

benpicco commented Aug 13, 2021

chrysn commented Aug 13, 2021 via email

janosbrodbeck commented Aug 13, 2021

benpicco commented Aug 13, 2021

chrysn commented Aug 13, 2021 via email

benpicco commented Aug 14, 2021

chrysn commented Aug 14, 2021 via email

janosbrodbeck commented Aug 14, 2021

chrysn commented Aug 16, 2021 via email

chrysn commented Sep 12, 2021

janosbrodbeck commented Nov 21, 2021

net/gcoap: add functions to read and set the next message id #16730

net/gcoap: add functions to read and set the next message id #16730

Conversation

janosbrodbeck commented Aug 13, 2021

Contribution description

Testing procedure

Issues/PRs references

benpicco commented Aug 13, 2021

chrysn commented Aug 13, 2021 via email

janosbrodbeck commented Aug 13, 2021

benpicco commented Aug 13, 2021

chrysn commented Aug 13, 2021 via email

benpicco commented Aug 14, 2021

chrysn commented Aug 14, 2021 via email

janosbrodbeck commented Aug 14, 2021

chrysn commented Aug 16, 2021 via email

chrysn commented Sep 12, 2021

janosbrodbeck commented Nov 21, 2021