-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix source dedup breaking with port wildcards #490
Conversation
Thanks for the pull request! We'll be looking at this shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out the issue, and also sending PR!
It would certainly be valuable not to run into an error with wildcard ports, but I have a few questions about this particular fix.
@@ -212,5 +212,9 @@ def strip_source_schemes(source_list) | |||
def symbol_to_hyphen_case(sym) | |||
sym.to_s.tr("_", "-") | |||
end | |||
|
|||
def source_scheme(source) | |||
URI(source.sub(PORT_WILDCARD_REGEX, '')).scheme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this calculates the source scheme of example.org:443
as example.org
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might work well enough for us.
URI(source.sub(PORT_WILDCARD_REGEX, '')).scheme | |
source.split("://", 2)[-2] || source.split(":", 2)[-2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel safer still using URI parsing. I've amended source_scheme
so that it considers this special case where the actual scheme is missing.
@@ -154,10 +154,10 @@ def reject_all_values_if_none(source_list) | |||
# e.g. *.github.com asdf.github.com becomes *.github.com | |||
def dedup_source_list(sources) | |||
sources = sources.uniq | |||
wild_sources = sources.select { |source| source =~ STAR_REGEXP } | |||
wild_sources = sources.select { |source| source =~ DOMAIN_WILDCARD_REGEX } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain the reasoning behind this?
As far as I can tell, it should be fine to remove example.org:443
if example.org:*
is present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I didn't consider that. I thought this method only deduplicates based on schemes, not based on ports as well. I've amended it so it works as expected again and added test cases covering this behaviour which it wasn't before.
Now we could just leave it be STAR_REGEXP
here but I still think it's a good idea to have two separate regexes and properly named for the 2 cases (subdomains and ports).
@machisuji Would you be able to revise this PR? |
Oh boy, sorry! I completely forgot about that. I will do that first thing tomorrow morning! Thanks for your patience. |
9d1c9e2
to
777d19a
Compare
@lgarron I've pushed some changes which should address your remarks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, and apologies for the delay! We had some internal complications around testing secure_headers
upgrades.
I think it would be nicer to fully parse URL expressions, but I believe this workaround should be safe enough to avoid the current bug. It may fail to deduplicate some patterns, but this is only a matter of optimization (not correctness) and we can address it in the future.
Thanks for the contribution!
777d19a
to
362aae2
Compare
Just pushed a tiny correction to use double quotes where single quotes were used before. |
uri = URI(source.sub(PORT_WILDCARD_REGEX, "")) | ||
# If host is nil the given source doesn't contain a scheme | ||
# e.g. for `example.org:443` it would return `example.org` as the scheme | ||
# which is of course incorrect | ||
uri.scheme if uri.host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that this also fails for URLs like https://*.example.org:*
What do you think of hardcoding just HTTP and HTTPS to avoid further wildcard issues?
uri = URI(source.sub(PORT_WILDCARD_REGEX, "")) | |
# If host is nil the given source doesn't contain a scheme | |
# e.g. for `example.org:443` it would return `example.org` as the scheme | |
# which is of course incorrect | |
uri.scheme if uri.host | |
return "https" if source.start_with?("https://") | |
return "http" if source.start_with?("http://") | |
nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, that will result in incorrect deduplication because of the code higher up. So I'd lean more towards just matching ^([A-Za-z0-9\-\+.]+):\/\/
, which should return the correct scheme if a valid URL pattern is passed in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree. I'll add a test case for that today and change it to the regex you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually having looked into it again I don't quite get what the issue is. Parsing https://*.example.org:*
would indeed fail but this is why in line 217 uri = URI(source.sub(PORT_WILDCARD_REGEX, ""))
it removes the port wildcard.
irb(main):003:0> PORT_WILDCARD_REGEX = /:\*/
=> /:\*/
irb(main):004:0> source = "https://*.example.org:*"
=> "https://*.example.org:*"
irb(main):005:0> URI(source.sub(PORT_WILDCARD_REGEX, ""))
=> #<URI::HTTPS https://*.example.org>
irb(main):006:0> _.scheme
=> "https"
So the source_scheme
method does return the correct result for "https://*.example.org:*"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I've tried to give this whole code a refactor at #498
This PR removes `dedup_source_list` and replaces it with a simple `.uniq` call. This resolves #491, which is only the latest in a series of ongoing issues with source expression deduplication. `secure_headers` has had this feature [since 2015](32bb3f5) that [deduplicates redundant URL source expressions](https://github.com/github/secure_headers/blob/494b75ff927464ed8d1c43e98e41fe4d15ce2bdf/lib/secure_headers/headers/content_security_policy.rb#L157-L170). For example, if `*.github.com` is listed as a source expression for a given [directive](https://w3c.github.io/webappsec-csp/#framework-directives), then the addition of `example.github.com` would have no effect, and so the latter can be safely removed by `secure_headers` to save bytes. Unfortunately, this implementation has had various bugs due to the use of "impedance mismatched" APIs like [`URI`](https://docs.ruby-lang.org/en/2.1.0/URI.html)[^1] and [`File.fnmatch`](https://apidock.com/ruby/v2_5_5/File/fnmatch/class)[^2]. For example, it made incorrect assumptions about source expression schemes, leading to the following series of events: [^1]: Which allows wildcards in domains but not for ports, as it is not designed to parse URL source expressions. [^2]: Which has general glob matching that is not designed for URL source expressions either. - 2017-03: A [bug was reported and confirmed](#317) - 2022-04: The bug was finally [fixed by `@keithamus` (a Hubber) in 2022](#478) due to our use of web sockets. - 2022-06: This fix in turn triggered a [new bug](#491) with source expressions like `data:`. - 2022-06: An external contributor [submitted a fix for the bew bug](#490), but this still doesn't address some of the "fast and loose" semantic issues of the underlying implementation. - 2022-08: `@lgarron` [drafted a new implementation](#498) that semantically parses and compares source expressions based on the specification for source expressions. - This implementation already proved to have some value in early testing, as its stricter validation caught an issue in `github.com`'s CSP. However, it would take additional work to make this implementation fully aware of CSP syntax (e.g. not allowing URL source expressions in a source directive when only special keywords are allowed, and vice-versa), and it relies on a new regex-based implementation of source expression parsing that may very well lead to more subtle bugs. In effect, this is a half feature whose maintenance cost has outweighed its functionality: - The relevant code has suffered from continued bugs, described as above. - Deduplication is purely a "nice-to-have" — it is not necessary for the security or correct functionality of `secure_headers`. - It was [introduced by `@oreoshake` (the then-maintainer) without explanation in 2015](32bb3f5), never "officially" documented. We have no concrete data on whether it has any performance impact on any real apps — for all we know, uncached deduplication calculations might even cost more than the saved header bytes. - Further, in response to the first relevant bug, `@oreoshake` himself [said](#317 (comment)): > I've never been a fan of the deduplication based on `*` anyways. Maybe we should just rip that out. > Like people trying to save a few bytes can optimize elsewhere. So this PR completely removes the functionality. If we learn of a use case where this was very important (and the app somehow can't preprocess the list before passing it to `secure_headers`), we can always resume consideration of one of: - #490 - #498
We've removed the feature altogether in #499 — see the PR description for more details. Thanks for the PR; I'm sorry it didn't work out. |
@lgarron Thanks for the update. I think it's the right call. If people want to save precious bytes, they can make sure not to pass in duplicate sources themselves, really. I'm happy either way because removing the feature removes the bug that was hindering us. |
Fixes the bug introduced with release v6.3.4 in #478 that causes errors like the following when using port wildcards such as
ws://localhost:*
in one's sources: