Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/gcoap: add functions to read and set the next message id #16730

Closed

Conversation

janosbrodbeck
Copy link
Member

Contribution description

This PR adds two new functions to gcoap, which allow to read the next message ID and to set it to a given value.
Currently, the next message ID is randomly generated while init, but that is not always wanted. I need to set it manually.

I intentionally named the function "read" to make it indirectly clear that this is not part of the regular use-case of gcoap. Well, I hope it does that with this naming :D

Testing procedure

Apply this small patch for the gcoap example:

diff --git a/examples/gcoap/gcoap_cli.c b/examples/gcoap/gcoap_cli.c
index f089a65c7b..ab3080b947 100644
--- a/examples/gcoap/gcoap_cli.c
+++ b/examples/gcoap/gcoap_cli.c
@@ -328,6 +328,7 @@ int gcoap_cli_cmd(int argc, char **argv)
 #endif
         printf(" CLI requests sent: %u\n", req_count);
         printf("CoAP open requests: %u\n", open_reqs);
+        printf("Next message ID: %u\n", gcoap_read_next_message_id());
         printf("Configured Proxy: ");
         if (_proxied) {
             char addrstr[IPV6_ADDR_MAX_STR_LEN];
@@ -488,6 +489,6 @@ void gcoap_cli_init(void)
         printf("gcoap: cannot add credential to DTLS sock: %d\n", res);
     }
 #endif
-
+    gcoap_set_next_message_id(10);
     gcoap_register_listener(&_listener);
 }
  1. coap info shows the next message ID. It is set to 10 by default with this patch.
  2. Send a request to a server (request does not need to succeed)
  3. Next message ID should be 11

Issues/PRs references

@github-actions github-actions bot added Area: CoAP Area: Constrained Application Protocol implementations Area: network Area: Networking Area: sys Area: System labels Aug 13, 2021
@benpicco benpicco requested a review from chrysn August 13, 2021 16:31
@benpicco
Copy link
Contributor

Under what circumstances does it make sense to control the message id? Isn't it like a sequence number that is only used internally?

@chrysn
Copy link
Member

chrysn commented Aug 13, 2021 via email

@janosbrodbeck
Copy link
Member Author

Under what circumstances does it make sense to control the message id? Isn't it like a sequence number that is only used internally?

When you want to maintain the next message ID between reboots or anything that reinitialize gcoap. Run into the scenario that a server maintained a larger message cache. After reboot, I got randomly a recent used message ID -> server responded with the previous response from the previous request because it had the same message ID. But then gcoap couldn't match because of a token mismatch.

Technically, this is not wrong of the server, even if it can forecast the scenario by the given token. According to the RFC, however, the token does not play a role here, only the message ID.

It may be a specific problem, and you may have to question the server here as well...but my main thought is primarily "I want to maintain the next message ID between reboots/hibernations".

@benpicco
Copy link
Contributor

Hm but how many messages does the server cache that even with just a 16 bit message ID a collision is likely?
To me that sounds more like a case of a poor source of randomness.

Do you still have the issue if you use prng_hwrng?

@chrysn
Copy link
Member

chrysn commented Aug 13, 2021 via email

@benpicco
Copy link
Contributor

I would really prefer if we could avoid such a workaround and instead make sure the PRNG is properly seeded across reboots if no HWRNG is available. If we can store a next message ID we could as well store a next seed - and that would benefit all users of functionality.

@chrysn
Copy link
Member

chrysn commented Aug 14, 2021 via email

@janosbrodbeck
Copy link
Member Author

I've checked and the prng_hwrng module is used. I had also looked at the distribution of the generated message ID after reboots two weeks ago, and that certainly looked random to me and not pseudorandom with a seed.

The message cache also does not hold a number of messages, but for a fixed period of time (server is based on californium). This scenario occurs very, very rarely, but it did happen. We are not even necessarily talking about a high number of messages. Reducing the cache time will probably solve the problem completely (test is already scheduled for next week) for my case, but you have to choose your traffic, cache and reboot timings wisely so that you have no chance for this problem at all.

I don't consider restoring the previous message ID as a workaround. I see this as a complete solution approach, because the probability, as small as it is, is given that after a reboot you get an identical message ID to the last X. And the consequence is that the client in case of CON messages tries the request several times (depending on the number of retransmissions) without success.

I'll take another look at it, I just don't have the hardware for it right now.

@chrysn
Copy link
Member

chrysn commented Aug 16, 2021 via email

@chrysn
Copy link
Member

chrysn commented Sep 12, 2021

How shall we continue on this?

I have a pretty concrete idea already on how system-wide extensible permission with minimal wear on the flash could be done, but don't get around to implementing this now.

@janosbrodbeck, what were your plans for backing the numbers? When would you have planned to extract the numbers? Presumably on controlled shutdowns ... hbo uncontrolled ones? Would they work also if the CoAP stack (really, on long term RIOT as a whole) called some "please persist this now" event?

@janosbrodbeck
Copy link
Member Author

@janosbrodbeck, what were your plans for backing the numbers? When would you have planned to extract the numbers? Presumably on controlled shutdowns ... hbo uncontrolled ones? Would they work also if the CoAP stack (really, on long term RIOT as a whole) called some "please persist this now" event?

Only controlled shutdowns were planned for the first time. A persist-event would imho only help, when a generic API would exist for that which then calls all supported/persist-enabled modules to persist now. That would then replace module specific persistance calls before a shutdown. If I got your idea correct.

How shall we continue on this?

I won't be working on the topic for the foreseeable future. I would close this PR, if that is okay. If someone else comes up with another idea/implementation that would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: CoAP Area: Constrained Application Protocol implementations Area: network Area: Networking Area: sys Area: System
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants