Fix "#valid? doesn't work for protocol-less urls" #58

walro · 2015-10-30T09:43:00Z

There were lots of options to go with there:

nil check in #scheme or #normalized_scheme
.to_s in #scheme or #normalized_scheme
use Addressable’s #normalized_scheme

I don’t think nil checks are that pretty, so that one was quickly discarded. I also don’t think we should #to_s “randomly” for some properties it’s either all or nothing in my opinion. Using Addressable’s own #normalized_scheme was the top candidate, but it did additional things to the url (not just downcasing it) and I think it’s a good thing that we try to control what we can when it comes to normalization.

So ultimately I went with this, I changed so that we check against #scheme instead of #normalized_scheme as that makes a whole lot of sense to me at least.

Fix #55.

Additionally, I did some refactoring which I felt was related to fixing this issue.

Makes for easier future refactoring

Add a guard for PublicSuffix makes #valid? simpler. Introduce a helper method to keep things tidy.

There were lots of options to go with there: * nil check in #scheme or #normalized_scheme * .to_s in #scheme or #normalized_scheme * use Addressable’s #normalized_scheme I don’t think nil checks are that pretty, so that one was quickly discarded. I also don’t think we should #to_s “randomly” for some properties it’s either all or nothing in my opinion. Using Addressable’s own #normalized_scheme was the top candidate, but it did additional things to the url (not just downcasing it) and I think it’s a good thing that we try to control what we can when it comes to normalization. So ultimately I went with this, I changed so that we check against #scheme instead of #normalized_scheme as that makes a whole lot of sense to me at least. Fix #55.

dentarg · 2015-10-30T18:50:43Z

I did this change

$ git d
diff --git a/spec/lib/twingly/url_spec.rb b/spec/lib/twingly/url_spec.rb
index e8cccdc..7eab2b2 100644
--- a/spec/lib/twingly/url_spec.rb
+++ b/spec/lib/twingly/url_spec.rb
@@ -37,7 +37,7 @@ end

 describe Twingly::URL do
   let(:test_url) do
-    "http://www.blog.twingly.co.uk/2015/07/01/language-detection-changes/"
+    "//www.blog.twingly.co.uk/2015/07/01/language-detection-changes/"
   end
   let(:url) { described_class.parse(test_url) }

And I got this result:

Failures:

  1) Twingly::URL#scheme should eq "http"
     Failure/Error: it { is_expected.to eq("http") }

       expected: "http"
            got: nil

       (compared using ==)
     # ./spec/lib/twingly/url_spec.rb:132:in `block (3 levels) in <top (required)>'

  2) Twingly::URL#origin should eq "http://www.blog.twingly.co.uk"
     Failure/Error: it { is_expected.to eq("http://www.blog.twingly.co.uk") }

       expected: "http://www.blog.twingly.co.uk"
            got: "null"

       (compared using ==)
     # ./spec/lib/twingly/url_spec.rb:162:in `block (3 levels) in <top (required)>'

  3) Twingly::URL#normalized_scheme
     Failure/Error: subject { url.normalized_scheme }
     NoMethodError:
       undefined method `downcase' for nil:NilClass
     # ./lib/twingly/url.rb:107:in `normalized_scheme'
     # ./spec/lib/twingly/url_spec.rb:176:in `block (3 levels) in <top (required)>'
     # ./spec/lib/twingly/url_spec.rb:177:in `block (3 levels) in <top (required)>'

Yes, failure is expected with this change, but not in this way I think.

As you said, this may not be the place to fix (not return nil) 1), maybe that's business for #52.

I have no idea what's going on with 2) 😄

I think we need to address now.

dentarg · 2015-10-30T18:55:17Z

lib/twingly/url.rb

@@ -123,7 +127,7 @@ def normalized_path
    end

    def valid?
-      addressable_uri && public_suffix_domain && SCHEMES.include?(normalized_scheme)
+      !!(scheme =~ ACCEPTED_SCHEMES)


It looks funny, seems like we only require a scheme to be valid, but I know that's not the case (this is the same as before). But I just wanted to write something about it.

I think making what we consider being an valid URL even more clear would be great. That might be future tasks for @twingly-mob.

It looks funny, seems like we only require a scheme to be valid, but I know that's not the case (this is the same as before). But I just wanted to write something about it.

My first attempt was actually this method returning true and I took care of this in .internal_parse (ensuring NullURL return), but then I realized that calling #new could lead to weird stuff (at least how it's currently implemented) so I went with the least amount of changes.

walro · 2015-10-30T19:01:11Z

1), maybe that's business for #52.

Yes, I was thinking of that one.

I have no idea what's going on with 2)

Haha, me neither. Probably somethnig we should fix.

I think we need to address now.

Hm, maybe here or maybe not. You really found a new problem (you effectively try to normalize an invalid url, not sure how it should work) :). The intended scope here was to ensure that "#valid?" does not lie to us. I think that #52 will solve this issue for us. Alternatively add another issue for "normalizing invalid urls behavior".

dentarg · 2015-10-30T19:03:39Z

Maybe all invalid URLs should return NullURL? But yeah, sounds like we need to have a new discussion elsewhere (not this issue/PR).

On 30 okt. 2015, at 20:01, Robin Wallin [email protected] wrote:

1), maybe that's business for #52.

Yes, I was thinking of that one.

I have no idea what's going on with 2)

Haha, me neither. Probably somethnig we should fix.

I think we need to address now.

Hm, maybe here or maybe not. You really found a new problem (you effectively try to normalize an invalid url, not sure how it should work) :). The intended scope here was to ensure that "#valid?" does not lie to us. I think that #52 will solve this issue for us. Alternatively add another issue for "normalizing invalid urls behavior".

—
Reply to this email directly or view it on GitHub.

walro · 2015-10-30T19:04:46Z

Maybe all invalid URLs should return NullURL? But yeah, sounds like we need to have a new discussion elsewhere (not this issue/PR).

Hehe, my thoughts exactly, just added that in #58 (comment) :)

We can’t test it now, but we think it’s worth it.

Will return NullURL. Closes #61.

The current implementation of .parse can't pass nil to .internal_parse, but just to be sure we want to test this, because we didn't notice this until now.

twingly-mob · 2015-11-02T15:37:04Z

Just a note about the "diff test" above

$ git d
diff --git a/spec/lib/twingly/url_spec.rb b/spec/lib/twingly/url_spec.rb
index 8fd4c7c..3cad381 100644
--- a/spec/lib/twingly/url_spec.rb
+++ b/spec/lib/twingly/url_spec.rb
@@ -37,7 +37,7 @@ end

 describe Twingly::URL do
   let(:test_url) do
-    "http://www.blog.twingly.co.uk/2015/07/01/language-detection-changes/"
+    "//www.blog.twingly.co.uk/2015/07/01/language-detection-changes/"
   end
   let(:url) { described_class.parse(test_url) }

Now the failures are as expected:

Failures:

  1) Twingly::URL#domain should eq "twingly.co.uk"
     Failure/Error: it { is_expected.to eq("twingly.co.uk") }

       expected: "twingly.co.uk"
            got: ""

       (compared using ==)
     # ./spec/lib/twingly/url_spec.rb:146:in `block (3 levels) in <top (required)>'

  2) Twingly::URL#normalized_path should eq "/2015/07/01/language-detection-changes"
     Failure/Error: it { is_expected.to eq("/2015/07/01/language-detection-changes") }

       expected: "/2015/07/01/language-detection-changes"
            got: ""

       (compared using ==)
     # ./spec/lib/twingly/url_spec.rb:166:in `block (3 levels) in <top (required)>'

  3) Twingly::URL#host should eq "www.blog.twingly.co.uk"
     Failure/Error: it { is_expected.to eq("www.blog.twingly.co.uk") }

       expected: "www.blog.twingly.co.uk"
            got: ""

       (compared using ==)
     # ./spec/lib/twingly/url_spec.rb:151:in `block (3 levels) in <top (required)>'

  4) Twingly::URL#origin should eq "http://www.blog.twingly.co.uk"
     Failure/Error: it { is_expected.to eq("http://www.blog.twingly.co.uk") }

       expected: "http://www.blog.twingly.co.uk"
            got: ""

       (compared using ==)
     # ./spec/lib/twingly/url_spec.rb:156:in `block (3 levels) in <top (required)>'

  5) Twingly::URL#normalized_scheme should eq "http"
     Failure/Error: it { is_expected.to eq("http") }

       expected: "http"
            got: ""

       (compared using ==)
     # ./spec/lib/twingly/url_spec.rb:171:in `block (3 levels) in <top (required)>'

  6) Twingly::URL#trd should eq "www.blog"
     Failure/Error: it { is_expected.to eq("www.blog") }

       expected: "www.blog"
            got: ""

       (compared using ==)
     # ./spec/lib/twingly/url_spec.rb:131:in `block (3 levels) in <top (required)>'

  7) Twingly::URL#sld should eq "twingly"
     Failure/Error: it { is_expected.to eq("twingly") }

       expected: "twingly"
            got: ""

       (compared using ==)
     # ./spec/lib/twingly/url_spec.rb:136:in `block (3 levels) in <top (required)>'

  8) Twingly::URL#normalized_host should eq "www.blog.twingly.co.uk"
     Failure/Error: it { is_expected.to eq("www.blog.twingly.co.uk") }

       expected: "www.blog.twingly.co.uk"
            got: ""

       (compared using ==)
     # ./spec/lib/twingly/url_spec.rb:176:in `block (3 levels) in <top (required)>'

  9) Twingly::URL#to_s should eq "//www.blog.twingly.co.uk/2015/07/01/language-detection-changes/"
     Failure/Error: it { is_expected.to eq(test_url) }

       expected: "//www.blog.twingly.co.uk/2015/07/01/language-detection-changes/"
            got: ""

       (compared using ==)
     # ./spec/lib/twingly/url_spec.rb:450:in `block (3 levels) in <top (required)>'

  10) Twingly::URL#scheme should eq "http"
      Failure/Error: it { is_expected.to eq("http") }

        expected: "http"
             got: ""

        (compared using ==)
      # ./spec/lib/twingly/url_spec.rb:126:in `block (3 levels) in <top (required)>'

  11) Twingly::URL.parse should be a kind of Twingly::URL
      Failure/Error: it { is_expected.to be_a(Twingly::URL) }
        expected #<Twingly::URL::NullURL:0x007fba4b56eb10> to be a kind of Twingly::URL
      # ./spec/lib/twingly/url_spec.rb:47:in `block (3 levels) in <top (required)>'

  12) Twingly::URL#tld should eq "co.uk"
      Failure/Error: it { is_expected.to eq("co.uk") }

        expected: "co.uk"
             got: ""

        (compared using ==)
      # ./spec/lib/twingly/url_spec.rb:141:in `block (3 levels) in <top (required)>'

  13) Twingly::URL#path should eq "/2015/07/01/language-detection-changes/"
      Failure/Error: it { is_expected.to eq("/2015/07/01/language-detection-changes/") }

        expected: "/2015/07/01/language-detection-changes/"
             got: ""

        (compared using ==)
      # ./spec/lib/twingly/url_spec.rb:161:in `block (3 levels) in <top (required)>'

Finished in 0.26022 seconds (files took 0.27812 seconds to load)
136 examples, 13 failures

Fix "#valid? doesn't work for protocol-less urls"

walro added 3 commits October 30, 2015 10:08

Reuse existing method

2660ca7

Makes for easier future refactoring

Refactor #internal_parse

25460f2

Add a guard for PublicSuffix makes #valid? simpler. Introduce a helper method to keep things tidy.

dentarg reviewed Oct 30, 2015
View reviewed changes

twingly-mob added 4 commits November 2, 2015 15:58

Make .to_addressable_uri private

265890a

We can’t test it now, but we think it’s worth it.

Now considering non http schemes invalid

e54938f

Will return NullURL. Closes #61.

Fix namespace bug

761ec56

The current implementation of .parse can't pass nil to .internal_parse, but just to be sure we want to test this, because we didn't notice this until now.

Make it clear that invalid URLs return NullURL

7c7d98b

twingly-mob mentioned this pull request Nov 2, 2015

Make more methods private #60

Closed

twingly-mob added a commit that referenced this pull request Nov 2, 2015

Merge pull request #58 from twingly/fix/55

fc5875f

Fix "#valid? doesn't work for protocol-less urls"

twingly-mob merged commit fc5875f into master Nov 2, 2015

twingly-mob deleted the fix/55 branch November 2, 2015 15:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix "#valid? doesn't work for protocol-less urls" #58

Fix "#valid? doesn't work for protocol-less urls" #58

walro commented Oct 30, 2015

dentarg commented Oct 30, 2015

dentarg Oct 30, 2015

walro Oct 30, 2015

walro commented Oct 30, 2015

dentarg commented Oct 30, 2015

walro commented Oct 30, 2015

twingly-mob commented Nov 2, 2015

Fix "#valid? doesn't work for protocol-less urls" #58

Fix "#valid? doesn't work for protocol-less urls" #58

Conversation

walro commented Oct 30, 2015

dentarg commented Oct 30, 2015

dentarg Oct 30, 2015

Choose a reason for hiding this comment

walro Oct 30, 2015

Choose a reason for hiding this comment

walro commented Oct 30, 2015

dentarg commented Oct 30, 2015

walro commented Oct 30, 2015

twingly-mob commented Nov 2, 2015