Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gcoap: fixes around empty ACKs #18429

Merged
merged 3 commits into from
Aug 10, 2022
Merged

Conversation

miri64
Copy link
Member

@miri64 miri64 commented Aug 9, 2022

Contribution description

This adds two fixes around the handling and sending of empty ACKs:

  1. Send empty ACKs on CON response, even if there is no memo that matches this response.
  2. Expire request on empty ACK when there is no response handler to wait for.

The 1. fix may be disputable. In my opinion it makes more sense to send a few short empty ACKs more than having some server sending a potentially large CON response multiple times just because the server missed a previous ACK, but I can remove this if this is for some reason not a good idea.

Testing procedure

Not sure how to test this... I saw it while running some experiments with #18386.

Issues/PRs references

None, but improves behavior in #18386.

@miri64 miri64 added the Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) label Aug 9, 2022
@miri64 miri64 requested a review from chrysn August 9, 2022 14:36
@github-actions github-actions bot added Area: CoAP Area: Constrained Application Protocol implementations Area: network Area: Networking Area: sys Area: System labels Aug 9, 2022
@miri64 miri64 added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Aug 9, 2022
Comment on lines 490 to 505
if (coap_get_type(&pdu) == COAP_TYPE_CON) {
/* we might run into this if an ACK to a sender got lost
* => best to ACK (again) */
messagelayer_emptyresponse_type = COAP_TYPE_ACK;
DEBUG("gcoap: Answering unknown CON response with ACK to "
"shut up sender\n");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reset should be send:

In case a message carrying a response is unexpected (the client is
not waiting for a response from the identified endpoint, at the
endpoint addressed, and/or with the given token), the response is
rejected (Sections 4.2 and 4.3).

Implementation Note: A client that receives a response in a CON
message may want to clean up the message state right after sending
the ACK. If that ACK is lost and the server retransmits the CON,
the client may no longer have any state to which to correlate this
response, making the retransmission an unexpected message; the
client will likely send a Reset message so it does not receive any
more retransmissions. This behavior is normal and not an
indication of an error. (Clients that are not aggressively
optimized in their state memory usage will still have message
state that will identify the second CON as a retransmission.
Clients that actually expect more messages from the server
[OBSERVE] will have to keep state in any case.)

https://datatracker.ietf.org/doc/html/rfc7252#section-5.3.2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, an RST is the intended behavior. (I thought we already did this, at least that's what I remember of last time touching the empty-ACK code).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, an RST is the intended behavior. (I thought we already did this, at least that's what I remember of last time touching the empty-ACK code).

I saw a server keep sending transmissions even though the CON was already ACK'd. I did not see any RST messages.

If the RFC says RST should be used, I use RST.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 2cd7857. From a quick search through the gcoap code, I am not sure, if RST is handled at all oO. The only other occurence of COAP_TYPE_RST is for the handling of "CoAP pings".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I just failed to pay attention to the "else /* no memo */" branch, thanks for the fix.

@miri64 miri64 force-pushed the gcoap/fix/empty-ack branch from 1636723 to df29505 Compare August 9, 2022 15:10
@chrysn
Copy link
Member

chrysn commented Aug 9, 2022

Ad testing: For aiocoap I do have run a few test cases with a "non-CoAP" peer that sends crafted packets that no CoAP stack would ever send. Maybe we should have something like that, but demanding it here would be excessive.

Did you do any tests? I don't have any idea on how to test this easily. Do we have any code that sets a NULL response handler at all, currently?

Ad 2: Expiring the memo has the side effect of sending an RST rather than an ACK on the actual response. That's not completely unreasonable and can also happen regularly (eg. when the ACK gets lost, the response is handled, and consequently the second response CON hits case 1), even when the server didn't retransmit (packet could have been duplicated), but it's an information sent to the server which we should be aware of. I think it's fine.

@miri64
Copy link
Member Author

miri64 commented Aug 9, 2022

Did you do any tests? I don't have any idea on how to test this easily. Do we have any code that sets a NULL response handler at all, currently?

I ran this in my experiments, however, only with the ACK variant in OP. I can do some runs with the RST variant tomorrow.

@miri64
Copy link
Member Author

miri64 commented Aug 10, 2022

Mhh as I feared in #18429 (comment) there seems to happen no handling of the RST case. With another ACK, the server ceases sending, with the RST it just continues to send :-/. Sorry that was wrong. The messages I were seeing were a bunch of RST messages with the same message ID... Will investigate.

@miri64
Copy link
Member Author

miri64 commented Aug 10, 2022

Fixed: Two problems still existed: expiring a memo does not delete the timer => doing that now, also I implementing the handling of RST message, by deleting the timeout and expiring the memo.

@miri64 miri64 force-pushed the gcoap/fix/empty-ack branch from 1895996 to f11c9e8 Compare August 10, 2022 14:09
@miri64
Copy link
Member Author

miri64 commented Aug 10, 2022

And squashed.

@miri64
Copy link
Member Author

miri64 commented Aug 10, 2022

To test I ran 2 clients that speak via a proxy with #18386 to a server with ~5 requests per second. This causes enough loss to both create scenarios (especially with POST request which are not cached or when the cache is not present in the proxy) where 1. the request may get lost en route between proxy and server (i.e. the server empty-ACK'd the request to the client and uses CON responses to send the server response back to the client) and 2. the ACK for the CON response by the proxy gets lost. Before the proxy will just continue sending CON responses until it times out, if the ACK gets lost. With these changes this does not happen anymore.

Copy link
Member

@maribu maribu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK. Code change looks good and I trust your testing.

@miri64 miri64 added the Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch label Aug 10, 2022
@miri64 miri64 enabled auto-merge August 10, 2022 16:42
@miri64 miri64 merged commit 60bd3ca into RIOT-OS:master Aug 10, 2022
@miri64 miri64 deleted the gcoap/fix/empty-ack branch August 10, 2022 21:28
@miri64
Copy link
Member Author

miri64 commented Aug 10, 2022

Backport provided in #18436

@maribu maribu added this to the Release 2022.10 milestone Oct 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: CoAP Area: Constrained Application Protocol implementations Area: network Area: Networking Area: sys Area: System CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants