-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Semantically parse and deduplicate source expressions #498
Conversation
expect(Kernel).to receive(:warn).with(%(frame_ancestors contains a ; in "google.com;script-src *;.;" which will raise an error in future versions. It has been replaced with a blank space.)) | ||
expect(ContentSecurityPolicy.new(frame_ancestors: %w(https://google.com;script-src https://*;.;)).value).to eq("frame-ancestors google.com script-src * .") | ||
expect(Kernel).to receive(:warn).with(%(frame_ancestors contains a ; in "https://google.com;script-src https://localhost;example.com;" which will raise an error in future versions. It has been replaced with a blank space.)) | ||
expect(ContentSecurityPolicy.new(frame_ancestors: %w(https://google.com;script-src https://localhost;example.com;)).value).to eq("frame-ancestors google.com script-src localhost example.com") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the CSP spec/MDN docs, it looks like .
is not a valid expression? https://www.w3.org/TR/CSP3/#framework-directive-source-list
This test seems to come from #418
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the original test fail on this PR? I hesitate to change the test if functionality under it didn't change.
|
||
module SecureHeaders | ||
class ContentSecurityPolicy | ||
class SourceExpression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should just call this a "expression" or "entry" even something more general, rather than a "source expression".
My goal was to support just "source lists", as defined in the spec and implied in content_security_policy.rb
. But for now we still need parse things that don't technically belong in source lists, such as report paths and as unquoted directives like script-src
(see code comments in other files).
f9ebbfb
to
60dd23f
Compare
@machisuji if you have time, your review would also be appreciated here having worked in this area very recently 🙇 |
Meta-concern: is this change a breaking one? I don’t think we have an easy way to differentiate between the quirks of this implementation and the other one. Those quirks weren’t documented, but they likely are now relied upon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn’t make it through the entire PR, but some initial feedback. Will attempt to remember to come back and do more, but please ping me if I forget.
source_list.map do |expression| | ||
if expression =~ /(\n|;)/ | ||
if !semicolon_warned_yet | ||
Kernel.warn("#{directive} contains a #{$1} in #{source_list.join(" ").inspect} which will raise an error in future versions. It has been replaced with a blank space.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely the $1
in the warning here will be very confusing if it’s an actual new line. Could we do $1.inspect
? I think that will show ”\n”
def clean_malformatted_sources(directive, source_list) | ||
cleaned_source_list = [] | ||
semicolon_warned_yet = false | ||
source_list.map do |expression| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn’t a map operation since it doesn’t have a meaningful array result. However we could do this using a reduce pattern, what do you think? reduce
is a weird name for the operation in this context. Ruby also offers inject
which is the same thing just aliased.
cleaned_source_list = source_list.inject([]) do |arr, source|
# … add elements to `arr` as needed.
end
If we like the code as-is, we could just change this to each
.
end | ||
cleaned_source_list.select { |value| value != "" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can’t remember if this library only works in Rails, but if it does you can use compact_blank
here.
cleaned_source_list.select { |value| value != "" } | |
cleaned_source_list.compact_blank |
Otherwise, I would do something like:
cleaned_source_list.select { |value| value != "" } | |
cleaned_source_list.reject(&:blank?) |
Additionally, with the suggestion above returning the array, you could call this on the return value and remove the need for cleaned_source_list
.
@@ -156,17 +174,14 @@ def reject_all_values_if_none(source_list) | |||
# e.g. *.github.com asdf.github.com becomes *.github.com | |||
def dedup_source_list(sources) | |||
sources = sources.uniq | |||
wild_sources = sources.select { |source| source =~ STAR_REGEXP } | |||
host_source_expressions = sources.map { |source| parse_source_expression(source) } | |||
# TODO: Split by source expression type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO before the PR merges? If not we should likely document it in a ticket and remove this comment.
@@ -156,17 +174,14 @@ def reject_all_values_if_none(source_list) | |||
# e.g. *.github.com asdf.github.com becomes *.github.com | |||
def dedup_source_list(sources) | |||
sources = sources.uniq | |||
wild_sources = sources.select { |source| source =~ STAR_REGEXP } | |||
host_source_expressions = sources.map { |source| parse_source_expression(source) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to not copy the array several times, you can use the in-place variants of these methods. Generally they end with an !
.
host_source_expressions = sources.map { |source| parse_source_expression(source) } | |
sources.map! { |source| parse_source_expression(source) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for uniq
/uniq!
and select
/select!
@@ -214,5 +229,9 @@ def strip_source_schemes(source_list) | |||
def symbol_to_hyphen_case(sym) | |||
sym.to_s.tr("_", "-") | |||
end | |||
|
|||
def source_scheme(source) | |||
source.match(/^([A-Za-z0-9\-\+.]+):\/\//)&.values_at(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is doing this using a regex the most reliable way? I imagine we had a good reason for not using a URI parser.
expect(Kernel).to receive(:warn).with(%(frame_ancestors contains a ; in "google.com;script-src *;.;" which will raise an error in future versions. It has been replaced with a blank space.)) | ||
expect(ContentSecurityPolicy.new(frame_ancestors: %w(https://google.com;script-src https://*;.;)).value).to eq("frame-ancestors google.com script-src * .") | ||
expect(Kernel).to receive(:warn).with(%(frame_ancestors contains a ; in "https://google.com;script-src https://localhost;example.com;" which will raise an error in future versions. It has been replaced with a blank space.)) | ||
expect(ContentSecurityPolicy.new(frame_ancestors: %w(https://google.com;script-src https://localhost;example.com;)).value).to eq("frame-ancestors google.com script-src localhost example.com") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the original test fail on this PR? I hesitate to change the test if functionality under it didn't change.
source_list.map do |expression| | ||
if expression =~ /(\n|;)/ | ||
if !semicolon_warned_yet | ||
Kernel.warn("#{directive} contains a #{$1} in #{source_list.join(" ").inspect} which will raise an error in future versions. It has been replaced with a blank space.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be out of scope for this PR but this kernel warning message and cleaning of ;
has been present since Jan 2020. Maybe the version in which this change is released is the version that raises an error? If this method raised an error, it could reduce the logic we need to verify in test.
This PR removes `dedup_source_list` and replaces it with a simple `.uniq` call. This resolves #491, which is only the latest in a series of ongoing issues with source expression deduplication. `secure_headers` has had this feature [since 2015](32bb3f5) that [deduplicates redundant URL source expressions](https://github.com/github/secure_headers/blob/494b75ff927464ed8d1c43e98e41fe4d15ce2bdf/lib/secure_headers/headers/content_security_policy.rb#L157-L170). For example, if `*.github.com` is listed as a source expression for a given [directive](https://w3c.github.io/webappsec-csp/#framework-directives), then the addition of `example.github.com` would have no effect, and so the latter can be safely removed by `secure_headers` to save bytes. Unfortunately, this implementation has had various bugs due to the use of "impedance mismatched" APIs like [`URI`](https://docs.ruby-lang.org/en/2.1.0/URI.html)[^1] and [`File.fnmatch`](https://apidock.com/ruby/v2_5_5/File/fnmatch/class)[^2]. For example, it made incorrect assumptions about source expression schemes, leading to the following series of events: [^1]: Which allows wildcards in domains but not for ports, as it is not designed to parse URL source expressions. [^2]: Which has general glob matching that is not designed for URL source expressions either. - 2017-03: A [bug was reported and confirmed](#317) - 2022-04: The bug was finally [fixed by `@keithamus` (a Hubber) in 2022](#478) due to our use of web sockets. - 2022-06: This fix in turn triggered a [new bug](#491) with source expressions like `data:`. - 2022-06: An external contributor [submitted a fix for the bew bug](#490), but this still doesn't address some of the "fast and loose" semantic issues of the underlying implementation. - 2022-08: `@lgarron` [drafted a new implementation](#498) that semantically parses and compares source expressions based on the specification for source expressions. - This implementation already proved to have some value in early testing, as its stricter validation caught an issue in `github.com`'s CSP. However, it would take additional work to make this implementation fully aware of CSP syntax (e.g. not allowing URL source expressions in a source directive when only special keywords are allowed, and vice-versa), and it relies on a new regex-based implementation of source expression parsing that may very well lead to more subtle bugs. In effect, this is a half feature whose maintenance cost has outweighed its functionality: - The relevant code has suffered from continued bugs, described as above. - Deduplication is purely a "nice-to-have" — it is not necessary for the security or correct functionality of `secure_headers`. - It was [introduced by `@oreoshake` (the then-maintainer) without explanation in 2015](32bb3f5), never "officially" documented. We have no concrete data on whether it has any performance impact on any real apps — for all we know, uncached deduplication calculations might even cost more than the saved header bytes. - Further, in response to the first relevant bug, `@oreoshake` himself [said](#317 (comment)): > I've never been a fan of the deduplication based on `*` anyways. Maybe we should just rip that out. > Like people trying to save a few bytes can optimize elsewhere. So this PR completely removes the functionality. If we learn of a use case where this was very important (and the app somehow can't preprocess the list before passing it to `secure_headers`), we can always resume consideration of one of: - #490 - #498
Abandoned in favor of #499 |
Recently, we've had a spate of fixes for parsing directives and source expressions, stemming from the fact that the code doesn't understand the format of valid expressions, and makes local assumptions about what they look like — in particular, assuming a resemblance to URLs during deduplication, when handling a lot of possible values that are not URLs.
#490
#478
This PR is an attempt to 'bite the bullet" and parse source expressions so we can semantically deduplicate matching URLs. In the future, we could use this to add more validation.
All PRs:
Documentation updated(N/A)Adding a new header