Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parsing of Set-Cookie headers #2329

Merged
merged 5 commits into from
Sep 10, 2022
Merged

Conversation

chrisvest
Copy link
Contributor

@chrisvest chrisvest commented Aug 22, 2022

Motivation:
The parsing of Set-Cookie headers is deviating from the standard in a couple of ways.

  • Cookie values are allowed to contain the equals sign character.
  • Percentage-prefixed hexadecimal literals are not a thing.
  • Only cookie-values need to adhere to the cookie-octet rule.
  • Defined but empty values should decode to the empty string rather than null, so we can distinguish between their empty and absent cases.
  • When parsing Set-Cookie values, every ; must be followed by a space.
    See the set-cookie-string rule in https://www.rfc-editor.org/rfc/rfc6265#section-4.1.1
    The parser should check for this, since it can lead to mis-parsing later data and produce confusing error messages, or the produce a cookie object with wrong data.

Modification:
When the parser decodes the '=' character, it now no longer updates the 'begin' index if the parse state is ParsingValue.
This means the parsed '=' character will be part of the cookie value.

The parse rule for the '%' character has been removed.
The associated validation code has also been removed, as it is no longer used.
This means the '%' character no longer has a special meaning to the parser.

When the parser validates a non-special character (a character that falls into the default case in the parser), the cookie-octet validation function is now only used if the parser is in the ParsingValue state.
Otherwise, we use a new cookie-attribute validation.
Most cookie-attributes need only follow the simple rule of "any CHAR except CTLs or ';'".

The parser already has a final-processing step for non-empty tails.
We add a similar step for when the tail is the empty string.
This way, cookie values, paths, and domains, that are defined but have no value, will be decoded to have the empty string as a value.
This way, we can tell if those attributes were defined but empty, or absent.

The parser for Set-Cookie HTTP headers now checks that ';' are followed by a space byte.
When cookie validation is enabled, an exception will be thrown when this is not the case.
When cookie validation is disabled, the space byte will no longer be blindly assumed to be there, but instead checked for, and the parser position will be adjusted accordingly.
This allows, e.g. "Set-Cookie: a=b;Expires=Mon, 22 Aug 2022 20:12:35 GMT" to be parsed correctly in non-validating mode.
Previously, the above would have thrown an exception for using a ',' character in an attribute value that do not allow them ("xpires" instead of "Expires").

The parse-state-switching algorithm has been changed.
It no longer relies on a map from attribute name to a parse-state-as-a-char-sequence.
Instead, we do a hard-coded binary search based on the attribute length (which we can read without paying for a bounds-check), and then we compare the field name with what would be expected for that length.
This gives us a significant boost to parsing speed, because we do no hashing, fewer string-compares, fewer memory indirections, and we let the CPU branch predictor do the heavy lifting.
It also takes up less memory as we no longer need the map of AV field names.
We can also remove the awkward ParseStateCharSequence class.

Improve the error message for when a quoted cookie value contains the ';' character, which is not allowed.
This also means we won't run into later errors about unbalanced quotation or the like.

Result:
The parsing of Set-Cookie headers is now more correct, and faster, and has better error reporting.

Copy link
Member

@idelpivnitskiy idelpivnitskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisvest Thank you for finding and contributing!

Motivation:
The parsing of Set-Cookie headers is deviating from the standard in a couple of ways.
- Cookie values are allowed to contain the equals sign character.
- Percentage-prefixed hexadecimal literals are not a thing.
- Only cookie-values need to adhere to the cookie-octet rule.
- Defined but empty values should decode to the empty string rather than null, so we can distinguish between their empty and absent cases.
- When parsing Set-Cookie values, every ; must be followed by a space.
  See the `set-cookie-string` rule in https://www.rfc-editor.org/rfc/rfc6265#section-4.1.1
  The parser should check for this, since it can lead to mis-parsing later data and produce confusing error messages, or the produce a cookie object with wrong data.

Modification:
When the parser decodes the '=' character, it now no longer updates the 'begin' index if the parse state is ParsingValue.
This means the parsed '=' character will be part of the cookie value.

The parse rule for the '%' character has been removed.
The associated validation code has also been removed, as it is no longer used.
This means the '%' character no longer has a special meaning to the parser.

When the parser validates a non-special character (a character that falls into the default case in the parser), the cookie-octet validation function is now only used if the parser is in the ParsingValue state.
Otherwise, we use a new cookie-attribute validation.
Most cookie-attributes need only follow the simple rule of "any CHAR except CTLs or ';'".

The parser already has a final-processing step for non-empty tails.
We add a similar step for when the tail is the empty string.
This way, cookie values, paths, and domains, that are defined but have no value, will be decoded to have the empty string as a value.
This way, we can tell if those attributes were defined but empty, or absent.

The parser for Set-Cookie HTTP headers now checks that ';' are followed by a space byte.
When cookie validation is enabled, an exception will be thrown when this is not the case.
When cookie validation is disabled, the space byte will no longer be blindly assumed to be there, but instead checked for, and the parser position will be adjusted accordingly.
This allows, e.g. "Set-Cookie: a=b;Expires=Mon, 22 Aug 2022 20:12:35 GMT" to be parsed correctly in non-validating mode.
Previously, the above would have thrown an exception for using a ',' character in an attribute value that do not allow them ("xpires" instead of "Expires").

The parse-state-switching algorithm has been changed.
It no longer relies on a map from attribute name to a parse-state-as-a-char-sequence.
Instead, we do a hard-coded binary search based on the attribute length (which we can read without paying for a bounds-check), and then we compare the field name with what would be expected for that length.
This gives us a significant boost to parsing speed, because we do no hashing, fewer string-compares, fewer memory indirections, and we let the CPU branch predictor do the heavy lifting.
It also takes up less memory as we no longer need the map of AV field names.
We can also remove the awkward ParseStateCharSequence class.

Improve the error message for when a quoted cookie value contains the ';' character, which is not allowed.
This also means we won't run into later errors about unbalanced quotation or the like.

Result:
The parsing of Set-Cookie headers is now more correct, and faster, and has better error reporting.
@chrisvest chrisvest changed the title Improve error message on missing space between cookie-av's Improve parsing of Set-Cookie headers Aug 29, 2022
@Scottmitch
Copy link
Member

> Task :servicetalk-http-api:pmdMain FAILED

you can run ./gradlew pmd locally to find issues.

@chrisvest
Copy link
Contributor Author

@idelpivnitskiy Build passed now.

Copy link
Member

@idelpivnitskiy idelpivnitskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for the nice improvements!

@@ -91,18 +91,18 @@ private static void decodeSecureCookieNames(final HttpHeaders headers) {

@Test
void decodeDifferentCookieNames() {
final HttpHeaders headers = new ReadOnlyHttpHeaders("set-cookie",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC ReadOnlyHttpHeaders is a pkg-private class that is not used for anywhere except tests. Can you please remove it with corresponding tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets consider removal of ReadOnlyHeaders as a followup PR if necessary. The intention was to potentially use it for some gRPC cases but headers now being exposed in filters makes that difficult todo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, didn't see @Scottmitch's comment in time. I already added a commit that removes it. Which way do you prefer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the last commit into a separate PR for visibility and easier revert if necessary.

@chrisvest
Copy link
Contributor Author

@idelpivnitskiy comments addressed

@@ -545,7 +547,7 @@ private static SameSite fromSequence(CharSequence cs, int begin, int end) {
}

private static boolean equalsIgnoreCaseLower(char c, char k) {
return c == k || c >= 'A' && c <= 'Z' && c == k - 32;
return CharSequences.equalsIgnoreCaseLower(c, k);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing the whole private method and just add a static import for CharSequences.equalsIgnoreCaseLower. It has the same name, same params.

@@ -91,18 +91,18 @@ private static void decodeSecureCookieNames(final HttpHeaders headers) {

@Test
void decodeDifferentCookieNames() {
final HttpHeaders headers = new ReadOnlyHttpHeaders("set-cookie",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the last commit into a separate PR for visibility and easier revert if necessary.

@chrisvest
Copy link
Contributor Author

@idelpivnitskiy Done. Moved it to #2354

@Scottmitch
Copy link
Member

build failure attributed to #2298

@Scottmitch Scottmitch merged commit 49f5f8f into apple:main Sep 10, 2022
@chrisvest chrisvest deleted the cookies branch September 10, 2022 01:01
idelpivnitskiy added a commit to idelpivnitskiy/servicetalk that referenced this pull request Sep 23, 2022
Motivation:

- apple#2329 made parsing of `set-cookie` header strict according to RFC6265.
In practice, there are still many implementations that encode cookies
according to the obsolete RFC2965 and/or RFC2109.
- Semicolon and space are not validated after a wrapped value.
- Without a cookie name in the exception message it's harder to find a
problematic cookie.

Modifications:
- Allow no space after semicolon by default;
- Add a system property
`io.servicetalk.http.api.headers.cookieParsingStrictRfc6265` to enforce
strict parsing;
- Instead of blindly skipping `SEMI` and `SP` after `DQUOTE`, validate
skipped characters;
- Include the cookie name (if already parsed) in all exception messages;
- Enhance test coverage for `DefaultHttpSetCookie#parseSetCookie`;

Result:

1. No space is required after semicolon by default.
2. Characters after a wrapped value are validated.
3. Exception messages include a cookie name when possible.
4. More test coverage.
idelpivnitskiy added a commit to idelpivnitskiy/servicetalk that referenced this pull request Sep 23, 2022
Motivation:

- apple#2329 made parsing of `set-cookie` header strict according to RFC6265.
In practice, there are still many implementations that encode cookies
according to the obsolete RFC2965 and/or RFC2109.
- Semicolon and space are not validated after a wrapped value.
- Without a cookie name in the exception message it's harder to find a
problematic cookie.

Modifications:
- Allow no space after semicolon by default;
- Add a system property
`io.servicetalk.http.api.headers.cookieParsingStrictRfc6265` to enforce
strict parsing;
- Instead of blindly skipping `SEMI` and `SP` after `DQUOTE`, validate
skipped characters;
- Include the cookie name (if already parsed) in all exception messages;
- Enhance test coverage for `DefaultHttpSetCookie#parseSetCookie`;

Result:

1. No space is required after semicolon by default.
2. Characters after a wrapped value are validated.
3. Exception messages include a cookie name when possible.
4. More test coverage.
idelpivnitskiy added a commit that referenced this pull request Sep 23, 2022
…cs (#2368)

Motivation:

- #2329 made parsing of `set-cookie` header strict according to RFC6265.
In practice, there are still many implementations that encode cookies
according to the obsolete RFC2965 and/or RFC2109.
- Semicolon and space are not validated after a wrapped value.
- Without a cookie name in the exception message it's harder to find a
problematic cookie.

Modifications:
- Allow no space after semicolon by default;
- Add a system property
`io.servicetalk.http.api.headers.cookieParsingStrictRfc6265` to enforce
strict parsing;
- Instead of blindly skipping `SEMI` and `SP` after `DQUOTE`, validate
skipped characters;
- Include the cookie name (if already parsed) in all exception messages;
- Enhance test coverage for `DefaultHttpSetCookie#parseSetCookie`;

Result:

1. No space is required after semicolon by default.
2. Characters after a wrapped value are validated.
3. Exception messages include a cookie name when possible.
4. More test coverage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants