Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP::Headers gets downcase #8060

Closed
bararchy opened this issue Aug 8, 2019 · 16 comments · Fixed by #8061
Closed

HTTP::Headers gets downcase #8060

bararchy opened this issue Aug 8, 2019 · 16 comments · Fixed by #8061
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:networking

Comments

@bararchy
Copy link
Contributor

bararchy commented Aug 8, 2019

This shows it perfectly:

require "http"

headers = <<-EOF
Content-Length: 0
User-Agent: Me!12
\n
EOF

io_mem = IO::Memory.new
HTTP.parse_headers_and_body(IO::Memory.new(headers)) do |headers, body|
  HTTP.serialize_headers(io_mem, headers)
  pp body
end

io_mem.to_s # => "content-length: 0\r\nuser-agent: Me!12\r\n\r\n"

The expected result is to get back the same "case" for the headers, but instead it gets downcased

@straight-shoota
Copy link
Member

This is due to recent optimizations in #8002. It only affects common header names which are cached downcased for easier comparability (see https://github.com/crystal-lang/crystal/pull/8002/files#diff-520d933e69a34d18025788f47787c6f2R173).
If you use an uncommon header name (for example Foo: 0), it gets copied verbatim.

Maybe we should consider a different solution to caching common names (for example StringPool as suggested in #8002 (comment)).

Because header names are case-insensitive this is not a very serious issue, though. But it can still be annoying.

@straight-shoota straight-shoota added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:networking labels Aug 8, 2019
@bararchy
Copy link
Contributor Author

bararchy commented Aug 8, 2019

We are doing a lot of HTTP::Headers manipulation, and we digest and parse them raw.
This introduced a need to manually upcase to verify using specs.

@asterite
Copy link
Member

asterite commented Aug 8, 2019

This introduced a need to manually upcase to verify using specs

But upcase shouldn't work here because you'd get CONTENT-TYPE instead of Content-Type?

In any case, I'll "fix" this by making it always return "Content-Type", "User-Agent", etc. Headers are case insensitive and that's a good default, I think.

@bararchy
Copy link
Contributor Author

bararchy commented Aug 8, 2019

@asterite I meant Capitalize :) my mistake.
we do something along the lines of .split("-").map {|h| h.capitalize}

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Aug 8, 2019

Headers in HTTP/2 are now always transmitted lowercase (followed in HTTP/3) and it's becoming common to now receive headers lowercased, even in HTTP/1 and it's clunky case-insensitiveness (with an HTTP/2 reverse proxy for example, which is now default in Google GKE for example).

I strongly advise to join the bandwagon. HTTP headers are now lowercase by default, and that's good.

@asterite there is nothing to fix, in fact all headers should always be lowercased.

@straight-shoota
Copy link
Member

@bararchy That doesn't work for ETag for example.

@ysbaddaden

there is nothing to fix, in fact all headers should always be lowercased.

Who says that?

Since case doesn't matter, IMHO the best behaviour is to not alter any names.

I could accept a general doctrine to consider all header names downcased though, if that's a future proof path. As long as we consequently downcase them everywhere (Headers.add for example), this is probably okay (sometimes it might be surprising though).

@bararchy
Copy link
Contributor Author

bararchy commented Aug 8, 2019

I'm good as long as what goes in == what comes out, otherwise, there are suprises

@straight-shoota
Copy link
Member

@bararchy Well, that's the question here: Should we keep everything as it comes in (which would imply possibly mixed styles in one place), or should we apply some kind of normalization to headers in order to have uniform representation.
I don't think that either way necessarily causes more issues than the other. But the status quo after #8002 is the least appreciated, but also not intended (common headers are normalized, others not).

@straight-shoota
Copy link
Member

oops

@Blacksmoke16
Copy link
Member

Source for the lowercasing headers:

https://tools.ietf.org/html/rfc7540#section-8.1.2

Just as in HTTP/1.x, header field names are strings of ASCII
characters that are compared in a case-insensitive fashion. However,
header field names MUST be converted to lowercase prior to their
encoding in HTTP/2. A request or response containing uppercase
header field names MUST be treated as malformed (Section 8.1.2.6).

@straight-shoota
Copy link
Member

Yeah for HTTP/2. There is no such rule for older protocols. But I guess we essentially want to use the same data structures for any HTTP protocol, so it's probably better to normalize downcased right away.

@asterite
Copy link
Member

asterite commented Aug 8, 2019

Maybe I can remove the optimization and leave the code as it was before.

@straight-shoota
Copy link
Member

A conservative change including optimization would be to compare common names case-sensitive and use capitalized strings (or both capitalized and downcased). That's strictly better than just reverting the optimization.

Alternatively we could just proceed forward and downcase everything as @ysbaddaden suggested.

@asterite
Copy link
Member

asterite commented Aug 9, 2019

@straight-shoota You mean making the lookup hash be:

COMMON_HEADERS = {
  "content-length" => "content-length",
  "Content-Length" => "Content-Length",
  # etc.
}

?

@straight-shoota
Copy link
Member

Yes. But you wouldn't need a hash then, a list would suffice when there is no normalization.

@asterite
Copy link
Member

asterite commented Aug 9, 2019

Makes sense. I like that approach. It wouldn't cover other cases like "Content-length", though we might also want to consider that... I'll later update this PR with all these variants. It's probably the best of all worlds: optimized and keeps the original casing in most common cases. Great idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:networking
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants