-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve parsing of Set-Cookie headers #2329
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chrisvest Thank you for finding and contributing!
servicetalk-http-api/src/main/java/io/servicetalk/http/api/DefaultHttpSetCookie.java
Show resolved
Hide resolved
servicetalk-http-api/src/test/java/io/servicetalk/http/api/DefaultHttpSetCookiesTest.java
Show resolved
Hide resolved
Motivation: The parsing of Set-Cookie headers is deviating from the standard in a couple of ways. - Cookie values are allowed to contain the equals sign character. - Percentage-prefixed hexadecimal literals are not a thing. - Only cookie-values need to adhere to the cookie-octet rule. - Defined but empty values should decode to the empty string rather than null, so we can distinguish between their empty and absent cases. - When parsing Set-Cookie values, every ; must be followed by a space. See the `set-cookie-string` rule in https://www.rfc-editor.org/rfc/rfc6265#section-4.1.1 The parser should check for this, since it can lead to mis-parsing later data and produce confusing error messages, or the produce a cookie object with wrong data. Modification: When the parser decodes the '=' character, it now no longer updates the 'begin' index if the parse state is ParsingValue. This means the parsed '=' character will be part of the cookie value. The parse rule for the '%' character has been removed. The associated validation code has also been removed, as it is no longer used. This means the '%' character no longer has a special meaning to the parser. When the parser validates a non-special character (a character that falls into the default case in the parser), the cookie-octet validation function is now only used if the parser is in the ParsingValue state. Otherwise, we use a new cookie-attribute validation. Most cookie-attributes need only follow the simple rule of "any CHAR except CTLs or ';'". The parser already has a final-processing step for non-empty tails. We add a similar step for when the tail is the empty string. This way, cookie values, paths, and domains, that are defined but have no value, will be decoded to have the empty string as a value. This way, we can tell if those attributes were defined but empty, or absent. The parser for Set-Cookie HTTP headers now checks that ';' are followed by a space byte. When cookie validation is enabled, an exception will be thrown when this is not the case. When cookie validation is disabled, the space byte will no longer be blindly assumed to be there, but instead checked for, and the parser position will be adjusted accordingly. This allows, e.g. "Set-Cookie: a=b;Expires=Mon, 22 Aug 2022 20:12:35 GMT" to be parsed correctly in non-validating mode. Previously, the above would have thrown an exception for using a ',' character in an attribute value that do not allow them ("xpires" instead of "Expires"). The parse-state-switching algorithm has been changed. It no longer relies on a map from attribute name to a parse-state-as-a-char-sequence. Instead, we do a hard-coded binary search based on the attribute length (which we can read without paying for a bounds-check), and then we compare the field name with what would be expected for that length. This gives us a significant boost to parsing speed, because we do no hashing, fewer string-compares, fewer memory indirections, and we let the CPU branch predictor do the heavy lifting. It also takes up less memory as we no longer need the map of AV field names. We can also remove the awkward ParseStateCharSequence class. Improve the error message for when a quoted cookie value contains the ';' character, which is not allowed. This also means we won't run into later errors about unbalanced quotation or the like. Result: The parsing of Set-Cookie headers is now more correct, and faster, and has better error reporting.
you can run |
@idelpivnitskiy Build passed now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for the nice improvements!
servicetalk-http-api/src/main/java/io/servicetalk/http/api/HeaderUtils.java
Outdated
Show resolved
Hide resolved
servicetalk-http-api/src/main/java/io/servicetalk/http/api/DefaultHttpSetCookie.java
Outdated
Show resolved
Hide resolved
servicetalk-http-api/src/main/java/io/servicetalk/http/api/DefaultHttpSetCookie.java
Outdated
Show resolved
Hide resolved
servicetalk-http-api/src/main/java/io/servicetalk/http/api/DefaultHttpSetCookie.java
Show resolved
Hide resolved
@@ -91,18 +91,18 @@ private static void decodeSecureCookieNames(final HttpHeaders headers) { | |||
|
|||
@Test | |||
void decodeDifferentCookieNames() { | |||
final HttpHeaders headers = new ReadOnlyHttpHeaders("set-cookie", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC ReadOnlyHttpHeaders
is a pkg-private class that is not used for anywhere except tests. Can you please remove it with corresponding tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets consider removal of ReadOnlyHeaders as a followup PR if necessary. The intention was to potentially use it for some gRPC cases but headers now being exposed in filters makes that difficult todo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, didn't see @Scottmitch's comment in time. I already added a commit that removes it. Which way do you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move the last commit into a separate PR for visibility and easier revert if necessary.
@idelpivnitskiy comments addressed |
@@ -545,7 +547,7 @@ private static SameSite fromSequence(CharSequence cs, int begin, int end) { | |||
} | |||
|
|||
private static boolean equalsIgnoreCaseLower(char c, char k) { | |||
return c == k || c >= 'A' && c <= 'Z' && c == k - 32; | |||
return CharSequences.equalsIgnoreCaseLower(c, k); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider removing the whole private method and just add a static import for CharSequences.equalsIgnoreCaseLower
. It has the same name, same params.
@@ -91,18 +91,18 @@ private static void decodeSecureCookieNames(final HttpHeaders headers) { | |||
|
|||
@Test | |||
void decodeDifferentCookieNames() { | |||
final HttpHeaders headers = new ReadOnlyHttpHeaders("set-cookie", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move the last commit into a separate PR for visibility and easier revert if necessary.
@idelpivnitskiy Done. Moved it to #2354 |
build failure attributed to #2298 |
Motivation: - apple#2329 made parsing of `set-cookie` header strict according to RFC6265. In practice, there are still many implementations that encode cookies according to the obsolete RFC2965 and/or RFC2109. - Semicolon and space are not validated after a wrapped value. - Without a cookie name in the exception message it's harder to find a problematic cookie. Modifications: - Allow no space after semicolon by default; - Add a system property `io.servicetalk.http.api.headers.cookieParsingStrictRfc6265` to enforce strict parsing; - Instead of blindly skipping `SEMI` and `SP` after `DQUOTE`, validate skipped characters; - Include the cookie name (if already parsed) in all exception messages; - Enhance test coverage for `DefaultHttpSetCookie#parseSetCookie`; Result: 1. No space is required after semicolon by default. 2. Characters after a wrapped value are validated. 3. Exception messages include a cookie name when possible. 4. More test coverage.
Motivation: - apple#2329 made parsing of `set-cookie` header strict according to RFC6265. In practice, there are still many implementations that encode cookies according to the obsolete RFC2965 and/or RFC2109. - Semicolon and space are not validated after a wrapped value. - Without a cookie name in the exception message it's harder to find a problematic cookie. Modifications: - Allow no space after semicolon by default; - Add a system property `io.servicetalk.http.api.headers.cookieParsingStrictRfc6265` to enforce strict parsing; - Instead of blindly skipping `SEMI` and `SP` after `DQUOTE`, validate skipped characters; - Include the cookie name (if already parsed) in all exception messages; - Enhance test coverage for `DefaultHttpSetCookie#parseSetCookie`; Result: 1. No space is required after semicolon by default. 2. Characters after a wrapped value are validated. 3. Exception messages include a cookie name when possible. 4. More test coverage.
…cs (#2368) Motivation: - #2329 made parsing of `set-cookie` header strict according to RFC6265. In practice, there are still many implementations that encode cookies according to the obsolete RFC2965 and/or RFC2109. - Semicolon and space are not validated after a wrapped value. - Without a cookie name in the exception message it's harder to find a problematic cookie. Modifications: - Allow no space after semicolon by default; - Add a system property `io.servicetalk.http.api.headers.cookieParsingStrictRfc6265` to enforce strict parsing; - Instead of blindly skipping `SEMI` and `SP` after `DQUOTE`, validate skipped characters; - Include the cookie name (if already parsed) in all exception messages; - Enhance test coverage for `DefaultHttpSetCookie#parseSetCookie`; Result: 1. No space is required after semicolon by default. 2. Characters after a wrapped value are validated. 3. Exception messages include a cookie name when possible. 4. More test coverage.
Motivation:
The parsing of Set-Cookie headers is deviating from the standard in a couple of ways.
See the
set-cookie-string
rule in https://www.rfc-editor.org/rfc/rfc6265#section-4.1.1The parser should check for this, since it can lead to mis-parsing later data and produce confusing error messages, or the produce a cookie object with wrong data.
Modification:
When the parser decodes the '=' character, it now no longer updates the 'begin' index if the parse state is ParsingValue.
This means the parsed '=' character will be part of the cookie value.
The parse rule for the '%' character has been removed.
The associated validation code has also been removed, as it is no longer used.
This means the '%' character no longer has a special meaning to the parser.
When the parser validates a non-special character (a character that falls into the default case in the parser), the cookie-octet validation function is now only used if the parser is in the ParsingValue state.
Otherwise, we use a new cookie-attribute validation.
Most cookie-attributes need only follow the simple rule of "any CHAR except CTLs or ';'".
The parser already has a final-processing step for non-empty tails.
We add a similar step for when the tail is the empty string.
This way, cookie values, paths, and domains, that are defined but have no value, will be decoded to have the empty string as a value.
This way, we can tell if those attributes were defined but empty, or absent.
The parser for Set-Cookie HTTP headers now checks that ';' are followed by a space byte.
When cookie validation is enabled, an exception will be thrown when this is not the case.
When cookie validation is disabled, the space byte will no longer be blindly assumed to be there, but instead checked for, and the parser position will be adjusted accordingly.
This allows, e.g. "Set-Cookie: a=b;Expires=Mon, 22 Aug 2022 20:12:35 GMT" to be parsed correctly in non-validating mode.
Previously, the above would have thrown an exception for using a ',' character in an attribute value that do not allow them ("xpires" instead of "Expires").
The parse-state-switching algorithm has been changed.
It no longer relies on a map from attribute name to a parse-state-as-a-char-sequence.
Instead, we do a hard-coded binary search based on the attribute length (which we can read without paying for a bounds-check), and then we compare the field name with what would be expected for that length.
This gives us a significant boost to parsing speed, because we do no hashing, fewer string-compares, fewer memory indirections, and we let the CPU branch predictor do the heavy lifting.
It also takes up less memory as we no longer need the map of AV field names.
We can also remove the awkward ParseStateCharSequence class.
Improve the error message for when a quoted cookie value contains the ';' character, which is not allowed.
This also means we won't run into later errors about unbalanced quotation or the like.
Result:
The parsing of Set-Cookie headers is now more correct, and faster, and has better error reporting.