Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix "#valid? doesn't work for protocol-less urls" #58

Merged
merged 7 commits into from
Nov 2, 2015
Merged

Conversation

walro
Copy link
Contributor

@walro walro commented Oct 30, 2015

There were lots of options to go with there:

  • nil check in #scheme or #normalized_scheme
  • .to_s in #scheme or #normalized_scheme
  • use Addressable’s #normalized_scheme

I don’t think nil checks are that pretty, so that one was quickly discarded. I also don’t think we should #to_s “randomly” for some properties it’s either all or nothing in my opinion. Using Addressable’s own #normalized_scheme was the top candidate, but it did additional things to the url (not just downcasing it) and I think it’s a good thing that we try to control what we can when it comes to normalization.

So ultimately I went with this, I changed so that we check against #scheme instead of #normalized_scheme as that makes a whole lot of sense to me at least.

Fix #55.

Additionally, I did some refactoring which I felt was related to fixing this issue.

walro added 3 commits October 30, 2015 10:08
Makes for easier future refactoring
Add a guard for PublicSuffix makes #valid? simpler. Introduce a helper
method to keep things tidy.
There were lots of options to go with there:

* nil check in #scheme or #normalized_scheme
* .to_s in #scheme or #normalized_scheme
* use Addressable’s #normalized_scheme

I don’t think nil checks are that pretty, so that one was quickly
discarded. I also don’t think we should #to_s “randomly” for some
properties it’s either all or nothing in my opinion. Using
Addressable’s own #normalized_scheme was the top candidate, but it did
additional things to the url (not just downcasing it) and I think it’s
a good thing that we try to control what we can when it comes to
normalization.

So ultimately I went with this, I changed so that we check against
#scheme instead of #normalized_scheme as that makes a whole lot of
sense to me at least.

Fix #55.
@dentarg
Copy link
Collaborator

dentarg commented Oct 30, 2015

I did this change

$ git d
diff --git a/spec/lib/twingly/url_spec.rb b/spec/lib/twingly/url_spec.rb
index e8cccdc..7eab2b2 100644
--- a/spec/lib/twingly/url_spec.rb
+++ b/spec/lib/twingly/url_spec.rb
@@ -37,7 +37,7 @@ end

 describe Twingly::URL do
   let(:test_url) do
-    "http://www.blog.twingly.co.uk/2015/07/01/language-detection-changes/"
+    "//www.blog.twingly.co.uk/2015/07/01/language-detection-changes/"
   end
   let(:url) { described_class.parse(test_url) }

And I got this result:

Failures:

  1) Twingly::URL#scheme should eq "http"
     Failure/Error: it { is_expected.to eq("http") }

       expected: "http"
            got: nil

       (compared using ==)
     # ./spec/lib/twingly/url_spec.rb:132:in `block (3 levels) in <top (required)>'

  2) Twingly::URL#origin should eq "http://www.blog.twingly.co.uk"
     Failure/Error: it { is_expected.to eq("http://www.blog.twingly.co.uk") }

       expected: "http://www.blog.twingly.co.uk"
            got: "null"

       (compared using ==)
     # ./spec/lib/twingly/url_spec.rb:162:in `block (3 levels) in <top (required)>'

  3) Twingly::URL#normalized_scheme
     Failure/Error: subject { url.normalized_scheme }
     NoMethodError:
       undefined method `downcase' for nil:NilClass
     # ./lib/twingly/url.rb:107:in `normalized_scheme'
     # ./spec/lib/twingly/url_spec.rb:176:in `block (3 levels) in <top (required)>'
     # ./spec/lib/twingly/url_spec.rb:177:in `block (3 levels) in <top (required)>'

Yes, failure is expected with this change, but not in this way I think.

As you said, this may not be the place to fix (not return nil) 1), maybe that's business for #52.

I have no idea what's going on with 2) 😄

  1. I think we need to address now.

@@ -123,7 +127,7 @@ def normalized_path
end

def valid?
addressable_uri && public_suffix_domain && SCHEMES.include?(normalized_scheme)
!!(scheme =~ ACCEPTED_SCHEMES)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks funny, seems like we only require a scheme to be valid, but I know that's not the case (this is the same as before). But I just wanted to write something about it.

I think making what we consider being an valid URL even more clear would be great. That might be future tasks for @twingly-mob.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks funny, seems like we only require a scheme to be valid, but I know that's not the case (this is the same as before). But I just wanted to write something about it.

My first attempt was actually this method returning true and I took care of this in .internal_parse (ensuring NullURL return), but then I realized that calling #new could lead to weird stuff (at least how it's currently implemented) so I went with the least amount of changes.

@walro
Copy link
Contributor Author

walro commented Oct 30, 2015

1), maybe that's business for #52.

Yes, I was thinking of that one.

I have no idea what's going on with 2)

Haha, me neither. Probably somethnig we should fix.

  1. I think we need to address now.

Hm, maybe here or maybe not. You really found a new problem (you effectively try to normalize an invalid url, not sure how it should work) :). The intended scope here was to ensure that "#valid?" does not lie to us. I think that #52 will solve this issue for us. Alternatively add another issue for "normalizing invalid urls behavior".

@dentarg
Copy link
Collaborator

dentarg commented Oct 30, 2015

Maybe all invalid URLs should return NullURL? But yeah, sounds like we need to have a new discussion elsewhere (not this issue/PR).

On 30 okt. 2015, at 20:01, Robin Wallin [email protected] wrote:

1), maybe that's business for #52.

Yes, I was thinking of that one.

I have no idea what's going on with 2)

Haha, me neither. Probably somethnig we should fix.

  1. I think we need to address now.

Hm, maybe here or maybe not. You really found a new problem (you effectively try to normalize an invalid url, not sure how it should work) :). The intended scope here was to ensure that "#valid?" does not lie to us. I think that #52 will solve this issue for us. Alternatively add another issue for "normalizing invalid urls behavior".


Reply to this email directly or view it on GitHub.

@walro
Copy link
Contributor Author

walro commented Oct 30, 2015

Maybe all invalid URLs should return NullURL? But yeah, sounds like we need to have a new discussion elsewhere (not this issue/PR).

Hehe, my thoughts exactly, just added that in #58 (comment) :)

We can’t test it now, but we think it’s worth it.
Will return NullURL.

Closes #61.
The current implementation of .parse can't pass nil to
.internal_parse, but just to be sure we want to test this, because
we didn't notice this until now.
@twingly-mob
Copy link
Member

Just a note about the "diff test" above

$ git d
diff --git a/spec/lib/twingly/url_spec.rb b/spec/lib/twingly/url_spec.rb
index 8fd4c7c..3cad381 100644
--- a/spec/lib/twingly/url_spec.rb
+++ b/spec/lib/twingly/url_spec.rb
@@ -37,7 +37,7 @@ end

 describe Twingly::URL do
   let(:test_url) do
-    "http://www.blog.twingly.co.uk/2015/07/01/language-detection-changes/"
+    "//www.blog.twingly.co.uk/2015/07/01/language-detection-changes/"
   end
   let(:url) { described_class.parse(test_url) }

Now the failures are as expected:

Failures:

  1) Twingly::URL#domain should eq "twingly.co.uk"
     Failure/Error: it { is_expected.to eq("twingly.co.uk") }

       expected: "twingly.co.uk"
            got: ""

       (compared using ==)
     # ./spec/lib/twingly/url_spec.rb:146:in `block (3 levels) in <top (required)>'

  2) Twingly::URL#normalized_path should eq "/2015/07/01/language-detection-changes"
     Failure/Error: it { is_expected.to eq("/2015/07/01/language-detection-changes") }

       expected: "/2015/07/01/language-detection-changes"
            got: ""

       (compared using ==)
     # ./spec/lib/twingly/url_spec.rb:166:in `block (3 levels) in <top (required)>'

  3) Twingly::URL#host should eq "www.blog.twingly.co.uk"
     Failure/Error: it { is_expected.to eq("www.blog.twingly.co.uk") }

       expected: "www.blog.twingly.co.uk"
            got: ""

       (compared using ==)
     # ./spec/lib/twingly/url_spec.rb:151:in `block (3 levels) in <top (required)>'

  4) Twingly::URL#origin should eq "http://www.blog.twingly.co.uk"
     Failure/Error: it { is_expected.to eq("http://www.blog.twingly.co.uk") }

       expected: "http://www.blog.twingly.co.uk"
            got: ""

       (compared using ==)
     # ./spec/lib/twingly/url_spec.rb:156:in `block (3 levels) in <top (required)>'

  5) Twingly::URL#normalized_scheme should eq "http"
     Failure/Error: it { is_expected.to eq("http") }

       expected: "http"
            got: ""

       (compared using ==)
     # ./spec/lib/twingly/url_spec.rb:171:in `block (3 levels) in <top (required)>'

  6) Twingly::URL#trd should eq "www.blog"
     Failure/Error: it { is_expected.to eq("www.blog") }

       expected: "www.blog"
            got: ""

       (compared using ==)
     # ./spec/lib/twingly/url_spec.rb:131:in `block (3 levels) in <top (required)>'

  7) Twingly::URL#sld should eq "twingly"
     Failure/Error: it { is_expected.to eq("twingly") }

       expected: "twingly"
            got: ""

       (compared using ==)
     # ./spec/lib/twingly/url_spec.rb:136:in `block (3 levels) in <top (required)>'

  8) Twingly::URL#normalized_host should eq "www.blog.twingly.co.uk"
     Failure/Error: it { is_expected.to eq("www.blog.twingly.co.uk") }

       expected: "www.blog.twingly.co.uk"
            got: ""

       (compared using ==)
     # ./spec/lib/twingly/url_spec.rb:176:in `block (3 levels) in <top (required)>'

  9) Twingly::URL#to_s should eq "//www.blog.twingly.co.uk/2015/07/01/language-detection-changes/"
     Failure/Error: it { is_expected.to eq(test_url) }

       expected: "//www.blog.twingly.co.uk/2015/07/01/language-detection-changes/"
            got: ""

       (compared using ==)
     # ./spec/lib/twingly/url_spec.rb:450:in `block (3 levels) in <top (required)>'

  10) Twingly::URL#scheme should eq "http"
      Failure/Error: it { is_expected.to eq("http") }

        expected: "http"
             got: ""

        (compared using ==)
      # ./spec/lib/twingly/url_spec.rb:126:in `block (3 levels) in <top (required)>'

  11) Twingly::URL.parse should be a kind of Twingly::URL
      Failure/Error: it { is_expected.to be_a(Twingly::URL) }
        expected #<Twingly::URL::NullURL:0x007fba4b56eb10> to be a kind of Twingly::URL
      # ./spec/lib/twingly/url_spec.rb:47:in `block (3 levels) in <top (required)>'

  12) Twingly::URL#tld should eq "co.uk"
      Failure/Error: it { is_expected.to eq("co.uk") }

        expected: "co.uk"
             got: ""

        (compared using ==)
      # ./spec/lib/twingly/url_spec.rb:141:in `block (3 levels) in <top (required)>'

  13) Twingly::URL#path should eq "/2015/07/01/language-detection-changes/"
      Failure/Error: it { is_expected.to eq("/2015/07/01/language-detection-changes/") }

        expected: "/2015/07/01/language-detection-changes/"
             got: ""

        (compared using ==)
      # ./spec/lib/twingly/url_spec.rb:161:in `block (3 levels) in <top (required)>'

Finished in 0.26022 seconds (files took 0.27812 seconds to load)
136 examples, 13 failures

twingly-mob added a commit that referenced this pull request Nov 2, 2015
Fix "#valid? doesn't work for protocol-less urls"
@twingly-mob twingly-mob merged commit fc5875f into master Nov 2, 2015
@twingly-mob twingly-mob deleted the fix/55 branch November 2, 2015 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

#valid? doesn't work for protocol-less urls
3 participants