Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix source dedup breaking with port wildcards #490

Closed
wants to merge 1 commit into from

Conversation

machisuji
Copy link

@machisuji machisuji commented Jun 29, 2022

Fixes the bug introduced with release v6.3.4 in #478 that causes errors like the following when using port wildcards such as ws://localhost:* in one's sources:

2022-06-29 08:56:52 +0100 Rack app ("GET /" - (::1)): #<URI::InvalidURIError: bad URI(is not URI?): "ws://localhost:*">
2022-06-29 08:56:52 +0100 Rack app ("GET /favicon.ico" - (::1)): #<URI::InvalidURIError: bad URI(is not URI?): "ws://localhost:*">

@vcsjones vcsjones requested a review from a team June 29, 2022 18:18
@vcsjones
Copy link
Member

Thanks for the pull request! We'll be looking at this shortly.

Copy link
Contributor

@lgarron lgarron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out the issue, and also sending PR!

It would certainly be valuable not to run into an error with wildcard ports, but I have a few questions about this particular fix.

@@ -212,5 +212,9 @@ def strip_source_schemes(source_list)
def symbol_to_hyphen_case(sym)
sym.to_s.tr("_", "-")
end

def source_scheme(source)
URI(source.sub(PORT_WILDCARD_REGEX, '')).scheme
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this calculates the source scheme of example.org:443 as example.org

Copy link
Contributor

@lgarron lgarron Jul 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might work well enough for us.

Suggested change
URI(source.sub(PORT_WILDCARD_REGEX, '')).scheme
source.split("://", 2)[-2] || source.split(":", 2)[-2]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel safer still using URI parsing. I've amended source_scheme so that it considers this special case where the actual scheme is missing.

@@ -154,10 +154,10 @@ def reject_all_values_if_none(source_list)
# e.g. *.github.com asdf.github.com becomes *.github.com
def dedup_source_list(sources)
sources = sources.uniq
wild_sources = sources.select { |source| source =~ STAR_REGEXP }
wild_sources = sources.select { |source| source =~ DOMAIN_WILDCARD_REGEX }
Copy link
Contributor

@lgarron lgarron Jul 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain the reasoning behind this?

As far as I can tell, it should be fine to remove example.org:443 if example.org:* is present.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I didn't consider that. I thought this method only deduplicates based on schemes, not based on ports as well. I've amended it so it works as expected again and added test cases covering this behaviour which it wasn't before.

Now we could just leave it be STAR_REGEXP here but I still think it's a good idea to have two separate regexes and properly named for the 2 cases (subdomains and ports).

@lgarron
Copy link
Contributor

lgarron commented Jul 19, 2022

@machisuji Would you be able to revise this PR?

@machisuji
Copy link
Author

@machisuji Would you be able to revise this PR?

Oh boy, sorry! I completely forgot about that. I will do that first thing tomorrow morning! Thanks for your patience.

@machisuji machisuji force-pushed the fix/port-wildcards branch from 9d1c9e2 to 777d19a Compare July 20, 2022 13:39
@machisuji
Copy link
Author

@lgarron I've pushed some changes which should address your remarks.

Copy link
Contributor

@lgarron lgarron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, and apologies for the delay! We had some internal complications around testing secure_headers upgrades.

I think it would be nicer to fully parse URL expressions, but I believe this workaround should be safe enough to avoid the current bug. It may fail to deduplicate some patterns, but this is only a matter of optimization (not correctness) and we can address it in the future.

Thanks for the contribution!

@machisuji
Copy link
Author

Just pushed a tiny correction to use double quotes where single quotes were used before.

Comment on lines +217 to +221
uri = URI(source.sub(PORT_WILDCARD_REGEX, ""))
# If host is nil the given source doesn't contain a scheme
# e.g. for `example.org:443` it would return `example.org` as the scheme
# which is of course incorrect
uri.scheme if uri.host
Copy link
Contributor

@lgarron lgarron Aug 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that this also fails for URLs like https://*.example.org:*

What do you think of hardcoding just HTTP and HTTPS to avoid further wildcard issues?

Suggested change
uri = URI(source.sub(PORT_WILDCARD_REGEX, ""))
# If host is nil the given source doesn't contain a scheme
# e.g. for `example.org:443` it would return `example.org` as the scheme
# which is of course incorrect
uri.scheme if uri.host
return "https" if source.start_with?("https://")
return "http" if source.start_with?("http://")
nil

Copy link
Contributor

@lgarron lgarron Aug 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that will result in incorrect deduplication because of the code higher up. So I'd lean more towards just matching ^([A-Za-z0-9\-\+.]+):\/\/, which should return the correct scheme if a valid URL pattern is passed in.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree. I'll add a test case for that today and change it to the regex you suggested.

Copy link
Author

@machisuji machisuji Aug 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually having looked into it again I don't quite get what the issue is. Parsing https://*.example.org:* would indeed fail but this is why in line 217 uri = URI(source.sub(PORT_WILDCARD_REGEX, "")) it removes the port wildcard.

irb(main):003:0> PORT_WILDCARD_REGEX = /:\*/
=> /:\*/
irb(main):004:0> source = "https://*.example.org:*"
=> "https://*.example.org:*"
irb(main):005:0> URI(source.sub(PORT_WILDCARD_REGEX, ""))
=> #<URI::HTTPS https://*.example.org>
irb(main):006:0> _.scheme
=> "https"

So the source_scheme method does return the correct result for "https://*.example.org:*".

Copy link
Contributor

@lgarron lgarron Aug 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I've tried to give this whole code a refactor at #498

lgarron added a commit that referenced this pull request Oct 24, 2022
This PR removes `dedup_source_list` and replaces it with a simple
`.uniq` call. This resolves
#491, which is only the
latest in a series of ongoing issues with source expression
deduplication.

`secure_headers` has had this feature [since
2015](32bb3f5)
that [deduplicates redundant URL source
expressions](https://github.com/github/secure_headers/blob/494b75ff927464ed8d1c43e98e41fe4d15ce2bdf/lib/secure_headers/headers/content_security_policy.rb#L157-L170).
For example, if `*.github.com` is listed as a source expression for a
given
[directive](https://w3c.github.io/webappsec-csp/#framework-directives),
then the addition of `example.github.com` would have no effect, and so
the latter can be safely removed by `secure_headers` to save bytes.

Unfortunately, this implementation has had various bugs due to the use
of "impedance mismatched" APIs like
[`URI`](https://docs.ruby-lang.org/en/2.1.0/URI.html)[^1] and
[`File.fnmatch`](https://apidock.com/ruby/v2_5_5/File/fnmatch/class)[^2].
For example, it made incorrect assumptions about source expression
schemes, leading to the following series of events:

[^1]: Which allows wildcards in domains but not for ports, as it is not
designed to parse URL source expressions.
[^2]: Which has general glob matching that is not designed for URL
source expressions either.

- 2017-03: A [bug was reported and
confirmed](#317)
- 2022-04: The bug was finally [fixed by `@keithamus` (a Hubber) in
2022](#478) due to our use
of web sockets.
- 2022-06: This fix in turn triggered a [new
bug](#491) with source
expressions like `data:`.
- 2022-06: An external contributor [submitted a fix for the bew
bug](#490), but this still
doesn't address some of the "fast and loose" semantic issues of the
underlying implementation.
- 2022-08: `@lgarron` [drafted a new
implementation](#498) that
semantically parses and compares source expressions based on the
specification for source expressions.
- This implementation already proved to have some value in early
testing, as its stricter validation caught an issue in `github.com`'s
CSP. However, it would take additional work to make this implementation
fully aware of CSP syntax (e.g. not allowing URL source expressions in a
source directive when only special keywords are allowed, and
vice-versa), and it relies on a new regex-based implementation of source
expression parsing that may very well lead to more subtle bugs.

In effect, this is a half feature whose maintenance cost has outweighed
its functionality:

- The relevant code has suffered from continued bugs, described as
above.
- Deduplication is purely a "nice-to-have" — it is not necessary for the
security or correct functionality of `secure_headers`.
- It was [introduced by `@oreoshake` (the then-maintainer) without
explanation in
2015](32bb3f5),
never "officially" documented. We have no concrete data on whether it
has any performance impact on any real apps — for all we know, uncached
deduplication calculations might even cost more than the saved header
bytes.
- Further, in response to the first relevant bug, `@oreoshake` himself
[said](#317 (comment)):

> I've never been a fan of the deduplication based on `*` anyways. Maybe
we should just rip that out.

> Like people trying to save a few bytes can optimize elsewhere.

So this PR completely removes the functionality. If we learn of a use
case where this was very important (and the app somehow can't preprocess
the list before passing it to `secure_headers`), we can always resume
consideration of one of:

- #490
- #498
@lgarron
Copy link
Contributor

lgarron commented Oct 25, 2022

We've removed the feature altogether in #499 — see the PR description for more details.

Thanks for the PR; I'm sorry it didn't work out.

@lgarron lgarron closed this Oct 25, 2022
@machisuji
Copy link
Author

@lgarron Thanks for the update. I think it's the right call. If people want to save precious bytes, they can make sure not to pass in duplicate sources themselves, really. I'm happy either way because removing the feature removes the bug that was hindering us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants