Improve parsing of Set-Cookie headers #2329

chrisvest · 2022-08-22T21:33:29Z

Motivation:
The parsing of Set-Cookie headers is deviating from the standard in a couple of ways.

Cookie values are allowed to contain the equals sign character.
Percentage-prefixed hexadecimal literals are not a thing.
Only cookie-values need to adhere to the cookie-octet rule.
Defined but empty values should decode to the empty string rather than null, so we can distinguish between their empty and absent cases.
When parsing Set-Cookie values, every ; must be followed by a space.
See the set-cookie-string rule in https://www.rfc-editor.org/rfc/rfc6265#section-4.1.1
The parser should check for this, since it can lead to mis-parsing later data and produce confusing error messages, or the produce a cookie object with wrong data.

Modification:
When the parser decodes the '=' character, it now no longer updates the 'begin' index if the parse state is ParsingValue.
This means the parsed '=' character will be part of the cookie value.

The parse rule for the '%' character has been removed.
The associated validation code has also been removed, as it is no longer used.
This means the '%' character no longer has a special meaning to the parser.

When the parser validates a non-special character (a character that falls into the default case in the parser), the cookie-octet validation function is now only used if the parser is in the ParsingValue state.
Otherwise, we use a new cookie-attribute validation.
Most cookie-attributes need only follow the simple rule of "any CHAR except CTLs or ';'".

The parser already has a final-processing step for non-empty tails.
We add a similar step for when the tail is the empty string.
This way, cookie values, paths, and domains, that are defined but have no value, will be decoded to have the empty string as a value.
This way, we can tell if those attributes were defined but empty, or absent.

The parser for Set-Cookie HTTP headers now checks that ';' are followed by a space byte.
When cookie validation is enabled, an exception will be thrown when this is not the case.
When cookie validation is disabled, the space byte will no longer be blindly assumed to be there, but instead checked for, and the parser position will be adjusted accordingly.
This allows, e.g. "Set-Cookie: a=b;Expires=Mon, 22 Aug 2022 20:12:35 GMT" to be parsed correctly in non-validating mode.
Previously, the above would have thrown an exception for using a ',' character in an attribute value that do not allow them ("xpires" instead of "Expires").

The parse-state-switching algorithm has been changed.
It no longer relies on a map from attribute name to a parse-state-as-a-char-sequence.
Instead, we do a hard-coded binary search based on the attribute length (which we can read without paying for a bounds-check), and then we compare the field name with what would be expected for that length.
This gives us a significant boost to parsing speed, because we do no hashing, fewer string-compares, fewer memory indirections, and we let the CPU branch predictor do the heavy lifting.
It also takes up less memory as we no longer need the map of AV field names.
We can also remove the awkward ParseStateCharSequence class.

Improve the error message for when a quoted cookie value contains the ';' character, which is not allowed.
This also means we won't run into later errors about unbalanced quotation or the like.

Result:
The parsing of Set-Cookie headers is now more correct, and faster, and has better error reporting.

idelpivnitskiy

@chrisvest Thank you for finding and contributing!

servicetalk-http-api/src/main/java/io/servicetalk/http/api/DefaultHttpSetCookie.java

servicetalk-http-api/src/test/java/io/servicetalk/http/api/DefaultHttpSetCookiesTest.java

Motivation: The parsing of Set-Cookie headers is deviating from the standard in a couple of ways. - Cookie values are allowed to contain the equals sign character. - Percentage-prefixed hexadecimal literals are not a thing. - Only cookie-values need to adhere to the cookie-octet rule. - Defined but empty values should decode to the empty string rather than null, so we can distinguish between their empty and absent cases. - When parsing Set-Cookie values, every ; must be followed by a space. See the `set-cookie-string` rule in https://www.rfc-editor.org/rfc/rfc6265#section-4.1.1 The parser should check for this, since it can lead to mis-parsing later data and produce confusing error messages, or the produce a cookie object with wrong data. Modification: When the parser decodes the '=' character, it now no longer updates the 'begin' index if the parse state is ParsingValue. This means the parsed '=' character will be part of the cookie value. The parse rule for the '%' character has been removed. The associated validation code has also been removed, as it is no longer used. This means the '%' character no longer has a special meaning to the parser. When the parser validates a non-special character (a character that falls into the default case in the parser), the cookie-octet validation function is now only used if the parser is in the ParsingValue state. Otherwise, we use a new cookie-attribute validation. Most cookie-attributes need only follow the simple rule of "any CHAR except CTLs or ';'". The parser already has a final-processing step for non-empty tails. We add a similar step for when the tail is the empty string. This way, cookie values, paths, and domains, that are defined but have no value, will be decoded to have the empty string as a value. This way, we can tell if those attributes were defined but empty, or absent. The parser for Set-Cookie HTTP headers now checks that ';' are followed by a space byte. When cookie validation is enabled, an exception will be thrown when this is not the case. When cookie validation is disabled, the space byte will no longer be blindly assumed to be there, but instead checked for, and the parser position will be adjusted accordingly. This allows, e.g. "Set-Cookie: a=b;Expires=Mon, 22 Aug 2022 20:12:35 GMT" to be parsed correctly in non-validating mode. Previously, the above would have thrown an exception for using a ',' character in an attribute value that do not allow them ("xpires" instead of "Expires"). The parse-state-switching algorithm has been changed. It no longer relies on a map from attribute name to a parse-state-as-a-char-sequence. Instead, we do a hard-coded binary search based on the attribute length (which we can read without paying for a bounds-check), and then we compare the field name with what would be expected for that length. This gives us a significant boost to parsing speed, because we do no hashing, fewer string-compares, fewer memory indirections, and we let the CPU branch predictor do the heavy lifting. It also takes up less memory as we no longer need the map of AV field names. We can also remove the awkward ParseStateCharSequence class. Improve the error message for when a quoted cookie value contains the ';' character, which is not allowed. This also means we won't run into later errors about unbalanced quotation or the like. Result: The parsing of Set-Cookie headers is now more correct, and faster, and has better error reporting.

Scottmitch · 2022-08-30T01:28:40Z

> Task :servicetalk-http-api:pmdMain FAILED

you can run ./gradlew pmd locally to find issues.

chrisvest · 2022-08-30T19:48:39Z

@idelpivnitskiy Build passed now.

idelpivnitskiy

Looks great, thanks for the nice improvements!

servicetalk-http-api/src/main/java/io/servicetalk/http/api/HeaderUtils.java

servicetalk-http-api/src/main/java/io/servicetalk/http/api/DefaultHttpSetCookie.java

idelpivnitskiy · 2022-09-09T22:35:12Z

servicetalk-http-api/src/test/java/io/servicetalk/http/api/DefaultHttpSetCookiesTest.java

@@ -91,18 +91,18 @@ private static void decodeSecureCookieNames(final HttpHeaders headers) {

    @Test
    void decodeDifferentCookieNames() {
-        final HttpHeaders headers = new ReadOnlyHttpHeaders("set-cookie",


IIUC ReadOnlyHttpHeaders is a pkg-private class that is not used for anywhere except tests. Can you please remove it with corresponding tests?

Lets consider removal of ReadOnlyHeaders as a followup PR if necessary. The intention was to potentially use it for some gRPC cases but headers now being exposed in filters makes that difficult todo.

Oh, didn't see @Scottmitch's comment in time. I already added a commit that removes it. Which way do you prefer?

Let's move the last commit into a separate PR for visibility and easier revert if necessary.

chrisvest · 2022-09-09T23:59:27Z

@idelpivnitskiy comments addressed

idelpivnitskiy · 2022-09-10T00:02:35Z

servicetalk-http-api/src/main/java/io/servicetalk/http/api/DefaultHttpSetCookie.java

@@ -545,7 +547,7 @@ private static SameSite fromSequence(CharSequence cs, int begin, int end) {
    }

    private static boolean equalsIgnoreCaseLower(char c, char k) {
-        return c == k || c >= 'A' && c <= 'Z' && c == k - 32;
+        return CharSequences.equalsIgnoreCaseLower(c, k);


Consider removing the whole private method and just add a static import for CharSequences.equalsIgnoreCaseLower. It has the same name, same params.

idelpivnitskiy · 2022-09-10T00:05:44Z

servicetalk-http-api/src/test/java/io/servicetalk/http/api/DefaultHttpSetCookiesTest.java

@@ -91,18 +91,18 @@ private static void decodeSecureCookieNames(final HttpHeaders headers) {

    @Test
    void decodeDifferentCookieNames() {
-        final HttpHeaders headers = new ReadOnlyHttpHeaders("set-cookie",


Let's move the last commit into a separate PR for visibility and easier revert if necessary.

chrisvest · 2022-09-10T00:14:56Z

@idelpivnitskiy Done. Moved it to #2354

Scottmitch · 2022-09-10T00:44:16Z

build failure attributed to #2298

Motivation: - apple#2329 made parsing of `set-cookie` header strict according to RFC6265. In practice, there are still many implementations that encode cookies according to the obsolete RFC2965 and/or RFC2109. - Semicolon and space are not validated after a wrapped value. - Without a cookie name in the exception message it's harder to find a problematic cookie. Modifications: - Allow no space after semicolon by default; - Add a system property `io.servicetalk.http.api.headers.cookieParsingStrictRfc6265` to enforce strict parsing; - Instead of blindly skipping `SEMI` and `SP` after `DQUOTE`, validate skipped characters; - Include the cookie name (if already parsed) in all exception messages; - Enhance test coverage for `DefaultHttpSetCookie#parseSetCookie`; Result: 1. No space is required after semicolon by default. 2. Characters after a wrapped value are validated. 3. Exception messages include a cookie name when possible. 4. More test coverage.

…cs (#2368) Motivation: - #2329 made parsing of `set-cookie` header strict according to RFC6265. In practice, there are still many implementations that encode cookies according to the obsolete RFC2965 and/or RFC2109. - Semicolon and space are not validated after a wrapped value. - Without a cookie name in the exception message it's harder to find a problematic cookie. Modifications: - Allow no space after semicolon by default; - Add a system property `io.servicetalk.http.api.headers.cookieParsingStrictRfc6265` to enforce strict parsing; - Instead of blindly skipping `SEMI` and `SP` after `DQUOTE`, validate skipped characters; - Include the cookie name (if already parsed) in all exception messages; - Enhance test coverage for `DefaultHttpSetCookie#parseSetCookie`; Result: 1. No space is required after semicolon by default. 2. Characters after a wrapped value are validated. 3. Exception messages include a cookie name when possible. 4. More test coverage.

idelpivnitskiy requested review from idelpivnitskiy, bondolo and tkountis August 24, 2022 05:06

idelpivnitskiy assigned chrisvest Aug 24, 2022

idelpivnitskiy reviewed Aug 24, 2022

View reviewed changes

servicetalk-http-api/src/main/java/io/servicetalk/http/api/DefaultHttpSetCookie.java Show resolved Hide resolved

servicetalk-http-api/src/test/java/io/servicetalk/http/api/DefaultHttpSetCookiesTest.java Show resolved Hide resolved

chrisvest force-pushed the cookies branch from 6a25ebc to ad3082f Compare August 29, 2022 22:25

chrisvest changed the title ~~Improve error message on missing space between cookie-av's~~ Improve parsing of Set-Cookie headers Aug 29, 2022

Fix checkstyle issues

1634da8

Fix PMD complaint

80afbbb

Scottmitch requested a review from idelpivnitskiy August 31, 2022 01:32

idelpivnitskiy reviewed Sep 9, 2022

View reviewed changes

Address small review comments

5b472f2

idelpivnitskiy approved these changes Sep 10, 2022

View reviewed changes

Remove superfluous method indirection

ea866f4

chrisvest force-pushed the cookies branch from 7c9fb1b to ea866f4 Compare September 10, 2022 00:10

chrisvest mentioned this pull request Sep 10, 2022

Remove read-only HTTP headers #2354

Merged

idelpivnitskiy approved these changes Sep 10, 2022

View reviewed changes

Scottmitch merged commit 49f5f8f into apple:main Sep 10, 2022

chrisvest deleted the cookies branch September 10, 2022 01:01

idelpivnitskiy mentioned this pull request Sep 23, 2022

Improvements for Set-Cookie parsing, allow lax parsing of older specs #2368

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve parsing of Set-Cookie headers #2329

Improve parsing of Set-Cookie headers #2329

chrisvest commented Aug 22, 2022 •

edited

Loading

idelpivnitskiy left a comment

Scottmitch commented Aug 30, 2022

chrisvest commented Aug 30, 2022

idelpivnitskiy left a comment

idelpivnitskiy Sep 9, 2022

Scottmitch Sep 9, 2022

chrisvest Sep 9, 2022

idelpivnitskiy Sep 10, 2022

chrisvest commented Sep 9, 2022

idelpivnitskiy Sep 10, 2022

idelpivnitskiy Sep 10, 2022

chrisvest commented Sep 10, 2022

Scottmitch commented Sep 10, 2022

Improve parsing of Set-Cookie headers #2329

Improve parsing of Set-Cookie headers #2329

Conversation

chrisvest commented Aug 22, 2022 • edited Loading

idelpivnitskiy left a comment

Choose a reason for hiding this comment

Scottmitch commented Aug 30, 2022

chrisvest commented Aug 30, 2022

idelpivnitskiy left a comment

Choose a reason for hiding this comment

idelpivnitskiy Sep 9, 2022

Choose a reason for hiding this comment

Scottmitch Sep 9, 2022

Choose a reason for hiding this comment

chrisvest Sep 9, 2022

Choose a reason for hiding this comment

idelpivnitskiy Sep 10, 2022

Choose a reason for hiding this comment

chrisvest commented Sep 9, 2022

idelpivnitskiy Sep 10, 2022

Choose a reason for hiding this comment

idelpivnitskiy Sep 10, 2022

Choose a reason for hiding this comment

chrisvest commented Sep 10, 2022

Scottmitch commented Sep 10, 2022

chrisvest commented Aug 22, 2022 •

edited

Loading