Tests #3

jage · 2014-02-20T13:39:21Z

Using minitest with some extras: * Turn for more informative run output * Shoulda for context and matchers Turn: https://github.com/turn-project/turn Shoulda: https://github.com/thoughtbot/shoulda

Broken URLs found during work with Zambezi

jage · 2014-02-20T14:13:23Z

test/unit/normalization_test.rb

+    end
+
+    should "handle URL with reference to another URL in it" do
+      url = "http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNGc4A_sfGS6fMMqggiK_8h6yk2miw&url=http:%20%20%20//fansided.com/2013/08/02/nike-decides-to-drop-milwaukee-brewers-ryan-braun"


Should this URL be supported?

Yes, why not?

Cause some library didn't like it (it's now working).
I don't know the RFC from my head, some strings might not be real URLs.

I agree this should be supported, if Chrome supports it it should work (we should snatch their tests!)

Works in curl too.

handle URL with reference to another URL in it

Hmm, is the "another URL" valid?

$ curl -v http:%20%20%20//fansided.com/2013/08/02/nike-decides-to-drop-milwaukee-brewers-ryan-braun * Adding handle: conn: 0x7fbd82007a00 * Adding handle: send: 0 * Adding handle: recv: 0 * Curl_addHandleToPipeline: length: 1 * - Conn 0 (0x7fbd82007a00) send_pipe: 1, recv_pipe: 0 * Could not resolve host: http * Closing connection 0 curl: (6) Could not resolve host: http

But I don't think it matters that it's a URL with another URL in it, parameters can contain whatever, don't they?

Probably. I think the original site changed the "internal URL", replaced some stuff.
Chrome rewrites the URL to: http:+++//..

But I'm not sure the test name is that could, but I couldn't figure out a better one.

Probably. I think the original site changed the "internal URL", replaced some stuff.

Or not, Google says "The previous page is sending you to an invalid url".

shoulda includes shoulda-context and shoulda-matchers, we’re not using the matchar at this moment, so no need to pull it in (since it introduces lots of development dependencies).

From #2

PostRank::URI couldn’t handle umlauts. We will lose the feature to detect urls without protocol “twingly.com”, but we don’t see the need for this feature. On the plus side, lots of runtime dependencies are removed (nokogiri!).

jage · 2014-02-20T14:55:29Z

test/unit/normalization_test.rb

+      assert_equal "http://www.twingly.com/", result
+    end
+
+    should "not be able to normalize url without protocol" do


Added this so we don't add this feature by mistake in the future.

jage · 2014-02-20T15:26:12Z

With Postrank::URI

Loaded Suite test,test/profile,test/unit

Started at 2014-02-20 16:24:15 +0100 w/ seed 36130.

NormalizerPerformanceTest
Thread ID: 70317498558180
Fiber ID: 70317500203180
Total: 30.794471
Sort by: self_time

 %self      total      self      wait     child     calls  name
  5.93      2.230     1.827     0.000     0.403   410000   PublicSuffix::Rule::Base#odiff 
  4.88      6.988     1.501     0.000     5.487   410000   PublicSuffix::Rule::Base#match? 
  4.50      3.358     1.384     0.000     1.974   425991   <Class::PublicSuffix::Domain>#domain_to_labels 
  3.70      1.138     1.138     0.000     0.000   451983   String#split 
  3.62      1.115     1.115     0.000     0.000    60000   String#=~ 
  3.19      0.981     0.981     0.000     0.000   160000   String#gsub 
  2.73      0.841     0.841     0.000     0.000  1170000   Kernel#instance_variable_defined? 
  2.67      5.164     0.824     0.000     4.341    30000   <Class::Addressable::URI>#normalize_component 
  2.63      6.655     0.810     0.000     5.845    30000   <Class::Addressable::URI>#parse 
  2.38      3.363     0.734     0.000     2.629   170000   Addressable::URI#validate 
  2.20      0.940     0.678     0.000     0.262   370000   Addressable::URI#host 
  2.06      0.635     0.635     0.000     0.000   455991   Array#reverse 
  1.94      7.592     0.596     0.000     6.996    20000   Array#select 
  1.81      0.779     0.557     0.000     0.222   300000   Addressable::URI#scheme 
  1.79      0.794     0.551     0.000     0.243   390000   String#== 
  1.66      0.849     0.510     0.000     0.339    40000   Addressable::URI#host= 
  1.40      1.034     0.433     0.000     0.601    30000   <Class::Addressable::URI>#encode_component 
  1.40      0.677     0.431     0.000     0.245    70001   Array#each 
  1.37      1.769     0.420     0.000     1.349    40000   Addressable::URI#scheme= 
  1.36     24.423     0.418     0.000    24.004    40000  *String#scan 
  1.32      0.562     0.407     0.000     0.155   220000   Addressable::URI#path 
  1.31      0.403     0.403     0.000     0.000   410000   Array#[] 
  1.23      0.380     0.380     0.000     0.000   538055   String#to_s 
  1.22      1.129     0.376     0.000     0.752    40000   <Class::Addressable::URI>#unencode 
  1.17      1.482     0.360     0.000     1.123    20000   Addressable::URI#to_s 
  1.15      0.354     0.354     0.000     0.000   246073   String#[] 
  1.13      0.487     0.347     0.000     0.140    50000   Addressable::URI#path= 
  0.98      0.777     0.303     0.000     0.474   210000   BasicObject#!= 
  0.98      0.558     0.300     0.000     0.258   110000   Kernel#dup 
  0.92      0.351     0.283     0.000     0.067    10000   PublicSuffix::Rule::Normal#decompose 
  0.89      0.838     0.275     0.000     0.563    50000   Addressable::URI#authority 
  0.89      1.084     0.275     0.000     0.809    50000   Addressable::URI#ip_based? 
  0.87      7.264     0.268     0.000     6.996    40000   Addressable::URI#initialize 
  0.86      0.542     0.264     0.000     0.278    30000   <Module::Addressable::IDNA>#unicode_sort_canonical 
  0.81      0.250     0.250     0.000     0.000   270000   Kernel#respond_to? 
  0.79      0.243     0.243     0.000     0.000   340000   Kernel#respond_to_missing? 
  0.79      0.242     0.242     0.000     0.000   100000   <Module::Addressable::IDNA>#lookup_unicode_combining_class 
  0.75      2.342     0.231     0.000     2.111    30000   <Module::Addressable::IDNA>#unicode_normalize_kc 
  0.68      6.852     0.208     0.000     6.644    40000   Addressable::URI#defer_validation 
  0.67      0.657     0.206     0.000     0.451    30000   <Module::Addressable::IDNA>#unicode_compose_pair 
  0.59      0.912     0.181     0.000     0.731    10000   Range#each 
  0.58      0.857     0.178     0.000     0.680    10000   Addressable::URI#authority= 
  0.56      0.173     0.173     0.000     0.000   140000   String#strip 
  0.56      0.298     0.172     0.000     0.125   120000   Kernel#initialize_dup 
  0.56      0.235     0.171     0.000     0.064   150000   Array#include? 
  0.55      0.374     0.169     0.000     0.205    40000   Addressable::URI#userinfo 
  0.49      1.087     0.151     0.000     0.936    30000   <Module::Addressable::IDNA>#unicode_compose 
  0.47      0.619     0.143     0.000     0.475    10000   Addressable::URI#replace_self 
  0.46      3.588     0.141     0.000     3.446    10000   Domainatrix::DomainParser#parse 
  0.45      0.408     0.139     0.000     0.269    10000   Domainatrix::DomainParser#parse_domains_from_host 
  0.44      0.136     0.136     0.000     0.000   180000   Kernel#is_a? 
  0.44      0.135     0.135     0.000     0.000    60000   Hash#keys 
  0.43      7.747     0.133     0.000     7.615    55992  *Class#new 
  0.43      0.310     0.131     0.000     0.179    50000   <Class::Addressable::URI>#ip_based_schemes 
  0.42      0.131     0.131     0.000     0.000    40000   NilClass#to_s 
  0.42      0.181     0.130     0.000     0.051    70000   Addressable::URI#query 
  0.42      0.129     0.129     0.000     0.000   130000   String#force_encoding 
  0.42      0.128     0.128     0.000     0.000    30000   Kernel#lambda 
  0.41      3.089     0.127     0.000     2.962    30000   Addressable::URI#normalized_scheme 
  0.40      3.014     0.124     0.000     2.890    10000   Addressable::URI#normalized_path 
  0.40      0.301     0.123     0.000     0.178    10000   <Class::Addressable::URI>#normalize_path 
  0.37      0.156     0.112     0.000     0.044    60000   Addressable::URI#password 
  0.36      0.153     0.110     0.000     0.042    60000   Addressable::URI#port 
  0.35     13.479     0.108     0.000    13.371    10000   PostRank::URI#c18n 
  0.34      0.976     0.105     0.000     0.870    10000   Addressable::URI#normalized_host 
  0.34      8.874     0.103     0.000     8.771    20001   Array#map 
  0.33      0.102     0.102     0.000     0.000   130000   Kernel#nil? 
  0.33     13.056     0.101     0.000    12.955    20000   PostRank::URI#parse 
  0.33      0.115     0.101     0.000     0.014    10000   Addressable::URI#password= 
  0.33      0.100     0.100     0.000     0.000   140000   String#to_str 
  0.32      0.099     0.099     0.000     0.000    75991   String#downcase 
  0.31      0.096     0.096     0.000     0.000    40000   Array#join 
  0.31      1.625     0.094     0.000     1.531    10000   Addressable::URI#normalized_authority 
  0.30      0.330     0.094     0.000     0.237    30000   <Module::Addressable::IDNA>#unicode_decompose 
  0.30      0.129     0.093     0.000     0.037    50000   Addressable::URI#user 
  0.30     10.220     0.093     0.000    10.127    10000   Addressable::URI#normalize 
  0.30      0.988     0.092     0.000     0.895    10000   PostRank::URI#normalize 
  0.29      0.164     0.090     0.000     0.074    10000   PostRank::URI#embedded 
  0.28      0.129     0.087     0.000     0.042    30000   Array#hash 
  0.28      0.086     0.086     0.000     0.000   105991   Hash#has_key? 
  0.27      0.085     0.085     0.000     0.000   120000   Kernel#instance_variable_set 
  0.27      0.083     0.083     0.000     0.000    40000   <Module::Addressable::IDNA>#lookup_unicode_compatibility 
  0.26      0.079     0.079     0.000     0.000    30000   Array#pack 
  0.26      0.125     0.079     0.000     0.046    10000   Addressable::URI#port= 
  0.25      0.078     0.078     0.000     0.000   105991   Kernel#class 
  0.25      0.118     0.078     0.000     0.041    10000   Addressable::URI#user= 
  0.24      0.075     0.075     0.000     0.000    90000   Kernel#kind_of? 
  0.24      0.074     0.074     0.000     0.000    60000   <Class::Addressable::URI>#port_mapping 
  0.24      0.101     0.073     0.000     0.027    40000   Addressable::URI#fragment 
  0.24      7.757     0.072     0.000     7.684    10000   PublicSuffix::List#select 
  0.24      0.072     0.072     0.000     0.000    30000   String#gsub! 
  0.23      0.071     0.071     0.000     0.000    80000   String#initialize_copy 
  0.23      1.136     0.071     0.000     1.065    40000   Kernel#!~ 
  0.23      0.177     0.070     0.000     0.107    10000   <Module::Addressable::IDNA>#to_ascii 
  0.21      0.194     0.065     0.000     0.129    30000   <Module::Addressable::IDNA>#lookup_unicode_composition 
  0.20      0.062     0.062     0.000     0.000    10000   Domainatrix::Url#initialize 
  0.19      0.192     0.057     0.000     0.135    10000   Addressable::URI#normalized_port 
  0.19     15.349     0.057     0.000    15.291    10000   PostRank::URI#clean 
  0.18      0.056     0.056     0.000     0.000    10000   Array#slice 
  0.18     24.276     0.055     0.000    24.221    10000   PostRank::URI#extract 
  0.17      6.400     0.053     0.000     6.347    10000   <Class::Twingly::URL::Normalizer>#normalize_url 
  0.16      0.051     0.051     0.000     0.000    30000   String#unpack 
  0.16      0.129     0.050     0.000     0.080     5991   PublicSuffix::Rule::Base#initialize 
  0.16      0.089     0.049     0.000     0.040    10000   Hash#merge 
  0.16      8.639     0.048     0.000     8.591    10000   <Module::PublicSuffix>#valid? 
  0.16      0.670     0.048     0.000     0.622    10000   Addressable::URI#fragment= 
  0.14      7.874     0.044     0.000     7.829    10000   PublicSuffix::List#find 
  0.14      0.042     0.042     0.000     0.000    60000   Kernel#hash 
  0.13      0.039     0.039     0.000     0.000    10000   Array#values_at 
  0.12      0.038     0.038     0.000     0.000    35991   Module#name 
  0.12      0.404     0.037     0.000     0.367    10000   PublicSuffix::Rule::Base#allow? 
  0.12      0.062     0.036     0.000     0.026    10000   Addressable::URI#query_values 
  0.12      0.036     0.036     0.000     0.000    10000   Addressable::URI#query= 
  0.11      0.035     0.035     0.000     0.000    10000   Array#& 
  0.11      0.084     0.035     0.000     0.049    10000   PostRank::URI#unescape 
  0.11      0.062     0.034     0.000     0.028    10000   Addressable::URI#normalized_query 
  0.11      0.261     0.034     0.000     0.227        1   IO#each_line 
  0.11      0.207     0.034     0.000     0.173     5991   <Class::PublicSuffix::Rule>#factory 
  0.10      0.068     0.032     0.000     0.036    10000   Addressable::URI#query_values= 
  0.10      0.030     0.030     0.000     0.000    30000   Array#initialize_copy 
  0.09     30.751     0.027     0.000    30.724    10000   <Class::Twingly::URL::Normalizer>#normalize 
  0.09     30.778     0.027     0.000    30.751   110000  *Proc#call 
  0.09     10.865     0.027     0.000    10.838    10000   Addressable::URI#normalize! 
  0.08      3.696     0.026     0.000     3.670    10000   <Module::Domainatrix>#parse 
  0.08      0.025     0.025     0.000     0.000    10000   Kernel#instance_variables 
  0.08      0.024     0.024     0.000     0.000    10000   String#chomp 
  0.08      0.024     0.024     0.000     0.000    10000   Hash#initialize_copy 
  0.08      0.024     0.024     0.000     0.000    30000   Module#== 
  0.07      0.022     0.022     0.000     0.000    10000   Regexp#match 
  0.07      0.032     0.022     0.000     0.010    10000   Enumerable#inject 
  0.07      0.021     0.021     0.000     0.000    10000   String#squeeze 
  0.07     24.296     0.020     0.000    24.276    10000   <Class::Twingly::URL::Normalizer>#extract_urls 
  0.07      0.046     0.020     0.000     0.026    10000   Addressable::URI#normalized_fragment 
  0.06      0.027     0.019     0.000     0.008    10000   Enumerable#any? 
  0.06      0.019     0.019     0.000     0.000    10000   String#tr 
  0.06      0.110     0.019     0.000     0.090    10000   Addressable::URI#normalized_userinfo 
  0.06      0.018     0.018     0.000     0.000    25991   Array#first 
  0.06      0.195     0.017     0.000     0.177    10001   Enumerable#each_with_index 
  0.05      0.039     0.017     0.000     0.022    10000   String#match 
  0.05     30.794     0.017     0.000    30.778        1   Integer#times 
  0.05      0.016     0.016     0.000     0.000    10000   PublicSuffix::Rule::Normal#parts 
  0.05      0.305     0.016     0.000     0.289    10000   <Class::PublicSuffix::List>#default 
  0.05      0.023     0.016     0.000     0.007    10000   Fixnum#== 
  0.05      0.015     0.015     0.000     0.000     5991   PublicSuffix::List#add 
  0.04      0.013     0.013     0.000     0.000    15991   Array#last 
  0.04      0.140     0.012     0.000     0.128     5909   PublicSuffix::Rule::Normal#initialize 
  0.04      0.011     0.011     0.000     0.000    10000   String#include? 
  0.04      0.011     0.011     0.000     0.000    10000   String#to_i 
  0.03      0.011     0.011     0.000     0.000    10000   Array#compact 
  0.03      0.009     0.009     0.000     0.000    10000   Array#push 
  0.03      0.008     0.008     0.000     0.000    10000   Symbol#== 
  0.03      0.008     0.008     0.000     0.000    10000   NilClass#nil? 
  0.02      0.007     0.007     0.000     0.000    10000   BasicObject#== 
  0.02      0.007     0.007     0.000     0.000     5991   Module#const_get 
  0.02      0.006     0.006     0.000     0.000     6868   String#strip! 
  0.02      0.006     0.006     0.000     0.000     5991   String#capitalize 
  0.02      0.005     0.005     0.000     0.000     5991   String#to_sym 
  0.00      0.000     0.000     0.000     0.000      308   Hash#[]= 
  0.00      0.001     0.000     0.000     0.001       41   PublicSuffix::Rule::Wildcard#initialize 
  0.00      0.001     0.000     0.000     0.001       41   PublicSuffix::Rule::Exception#initialize 
  0.00     30.794     0.000     0.000    30.794        1   Object#measure 
  0.00      0.000     0.000     0.000     0.000        1   File#initialize 
  0.00      0.289     0.000     0.000     0.289        1   PublicSuffix::List#initialize 
  0.00      0.000     0.000     0.000     0.000        1   <Class::PublicSuffix::List>#default_definition 
  0.00      0.028     0.000     0.000     0.028        1   PublicSuffix::List#create_index! 
  0.00      0.289     0.000     0.000     0.289        1   <Class::PublicSuffix::List>#parse 
  0.00      0.000     0.000     0.000     0.000        1   <Class::File>#dirname 
  0.00      0.000     0.000     0.000     0.000        1   <Class::File>#join 
  0.00      0.000     0.000     0.000     0.000        1   <Class::IO>#new 
  0.00      0.000     0.000     0.000     0.000        1   Kernel#block_given? 

* indicates recursively called methods
              PASS (0:00:30.961) test: .normalize_url should normalizing a short URL (10000x). 

Finished in 30.961786 seconds.

1 tests, 1 passed, 0 failures, 0 errors, 0 skips, 0 assertions

Without Postrank::URI

Loaded Suite test,test/profile,test/unit

Started at 2014-02-20 16:23:30 +0100 w/ seed 21376.

NormalizerPerformanceTest
Thread ID: 70309785233120
Fiber ID: 70309800490320
Total: 5.622905
Sort by: self_time

 %self      total      self      wait     child     calls  name
  9.97      3.701     0.561     0.000     3.140    20000   <Class::Addressable::URI>#parse 
  4.99      1.319     0.281     0.000     1.038    60000   Addressable::URI#validate 
  4.73      0.266     0.266     0.000     0.000   360000   Kernel#instance_variable_defined? 
  4.32      0.403     0.243     0.000     0.160    20000   Addressable::URI#host= 
  4.17      0.339     0.235     0.000     0.105   170000   String#== 
  3.96      0.310     0.222     0.000     0.088   120000   Addressable::URI#host 
  3.70      0.390     0.208     0.000     0.182    20000   Addressable::URI#scheme= 
  3.35      0.265     0.188     0.000     0.077   100000   Addressable::URI#scheme 
  3.18      0.750     0.179     0.000     0.571    10000   Addressable::URI#to_s 
  2.74      0.154     0.154     0.000     0.000   110000   String#[] 
  2.64      0.206     0.148     0.000     0.057    80000   Addressable::URI#path 
  2.57      0.341     0.145     0.000     0.196    10000   Domainatrix::DomainParser#parse_domains_from_host 
  2.55      0.361     0.144     0.000     0.218   100000   BasicObject#!= 
  2.52      0.193     0.142     0.000     0.052    20000   Addressable::URI#path= 
  2.42      2.610     0.136     0.000     2.474    10000   Domainatrix::DomainParser#parse 
  2.23      2.718     0.125     0.000     2.593    20000   Addressable::URI#initialize 
  2.16      0.122     0.122     0.000     0.000    20000   String#scan 
  1.96      0.429     0.110     0.000     0.319    20000   Addressable::URI#ip_based? 
  1.92      2.564     0.108     0.000     2.456    20000   Addressable::URI#defer_validation 
  1.86      0.105     0.105     0.000     0.000   150000   Kernel#respond_to_missing? 
  1.75      0.297     0.099     0.000     0.199    20000   Addressable::URI#authority 
  1.55      2.867     0.087     0.000     2.779    30000   Class#new 
  1.47      0.109     0.083     0.000     0.026    10000   Array#each 
  1.23      0.069     0.069     0.000     0.000    40000   String#gsub 
  1.11      5.469     0.063     0.000     5.406    20000   Array#map 
  1.09      0.061     0.061     0.000     0.000    10000   Domainatrix::Url#initialize 
  1.09      0.061     0.061     0.000     0.000    70000   Kernel#respond_to? 
  1.07      0.060     0.060     0.000     0.000    50000   String#strip 
  1.00      0.056     0.056     0.000     0.000    20000   Hash#keys 
  0.93      5.393     0.052     0.000     5.341    10000   <Class::Twingly::URL::Normalizer>#normalize_url 
  0.92      0.052     0.052     0.000     0.000    20000   String#=~ 
  0.92      0.133     0.051     0.000     0.081    20000   <Class::Addressable::URI>#ip_based_schemes 
  0.81      0.045     0.045     0.000     0.000    60000   Hash#has_key? 
  0.76      0.094     0.043     0.000     0.052    10000   Addressable::URI#userinfo 
  0.75      0.042     0.042     0.000     0.000    60000   String#to_str 
  0.74      0.119     0.041     0.000     0.078    10000   <Class::Twingly::URL::Normalizer>#extract_urls 
  0.67      0.052     0.037     0.000     0.015    20000   Addressable::URI#query 
  0.58      0.033     0.033     0.000     0.000    20000   String#split 
  0.55      0.083     0.031     0.000     0.052    20000   Kernel#!~ 
  0.53      0.065     0.030     0.000     0.035    10000   Hash#merge 
  0.51      0.044     0.029     0.000     0.015    20000   Array#include? 
  0.49      0.028     0.028     0.000     0.000    20000   Array#join 
  0.47      0.026     0.026     0.000     0.000    10000   Array#flatten 
  0.46      5.583     0.026     0.000     5.557    10000   <Class::Twingly::URL::Normalizer>#normalize 
  0.45      0.025     0.025     0.000     0.000    20000   <Class::Addressable::URI>#port_mapping 
  0.45      2.743     0.025     0.000     2.717    10000   <Module::Domainatrix>#parse 
  0.45      5.608     0.025     0.000     5.583    30000  *Proc#call 
  0.42      0.023     0.023     0.000     0.000    30000   Array#reverse 
  0.39      0.022     0.022     0.000     0.000    30000   String#to_s 
  0.36      0.020     0.020     0.000     0.000    10000   Hash#initialize_copy 
  0.34      0.019     0.019     0.000     0.000    20000   Module#name 
  0.34      0.026     0.019     0.000     0.007    10000   Addressable::URI#user 
  0.33      0.026     0.019     0.000     0.007    10000   Addressable::URI#fragment 
  0.33      0.026     0.019     0.000     0.007    10000   Addressable::URI#password 
  0.33      0.026     0.019     0.000     0.008    10000   Addressable::URI#port 
  0.33      0.019     0.019     0.000     0.000    20000   String#downcase 
  0.31      0.126     0.017     0.000     0.109    10000   Enumerable#each_with_index 
  0.31      0.017     0.017     0.000     0.000    20000   Kernel#kind_of? 
  0.28      0.016     0.016     0.000     0.000    20000   Kernel#is_a? 
  0.27      0.035     0.015     0.000     0.020    10000   Kernel#initialize_dup 
  0.27      5.623     0.015     0.000     5.608        1   Integer#times 
  0.25      0.014     0.014     0.000     0.000    20000   Kernel#class 
  0.23      0.013     0.013     0.000     0.000    10000   Kernel#Array 
  0.17      0.010     0.010     0.000     0.000    10000   Array#slice 
  0.16      0.009     0.009     0.000     0.000    10000   String#force_encoding 
  0.13      0.007     0.007     0.000     0.000    10000   Symbol#to_proc 
  0.00      5.623     0.000     0.000     5.623        1   Object#measure 

* indicates recursively called methods
              PASS (0:00:05.639) test: .normalize_url should normalizing a short URL (10000x). 

Finished in 5.639741 seconds.

1 tests, 1 passed, 0 failures, 0 errors, 0 skips, 0 assertions

walro · 2014-02-20T15:29:37Z

Much improve, so amaze!

Inspiration from elasticsearch-transport tests: https://github.com/elasticsearch/elasticsearch-ruby/blob/6f83143b8e6409a 2eaf451a4dabf2c64f25ade31/elasticsearch-transport/test/profile/client_be nchmark_test.rb

jage · 2014-02-20T15:34:11Z

I say I'm done!

walro · 2014-02-20T16:17:36Z

Should not exist in gems

In 19d28c6 when I removed Postrank::URI, I removed the feature that detected URLs without protocol. This commits enables tests for it again.

Enabled the behavior removed in 19d28c6 This uses PublicSuffix and Addressable instead of Postrank::URI though. Why? Postrank::URI was very slow, this is also slow, but not quite as slow.

jage · 2014-02-20T18:30:05Z

Ok, we had some discussion about the changed behavior in this gem. I've added the old features again, but with new code.

I'm using PublicSuffix instead of PostRank::URI.

Since I'm verifying the domains, this is pretty slow. Not quite as slow as Postrank::URI though.

Loaded Suite test/lib,test,test/profile,test/unit

Started at 2014-02-20 19:29:04 +0100 w/ seed 4195.

NormalizerPerformanceTest
Thread ID: 70319318886120
Fiber ID: 70319324815320
Total: 12.994087
Sort by: self_time

 %self      total      self      wait     child     calls  name
 13.80      2.209     1.793     0.000     0.416   420000   PublicSuffix::Rule::Base#odiff 
 12.21      7.021     1.587     0.000     5.434   420000   PublicSuffix::Rule::Base#match? 
 11.79      3.333     1.533     0.000     1.801   436385   <Class::PublicSuffix::Domain>#domain_to_labels 
  8.45      1.098     1.098     0.000     0.000   462771   String#split 
  4.77      7.648     0.620     0.000     7.029    20000   Array#select 
  3.32      0.432     0.432     0.000     0.000   436385   Array#reverse 
  3.20      0.531     0.416     0.000     0.116    20000   PublicSuffix::Rule::Normal#decompose 
  3.20      0.416     0.416     0.000     0.000   420000   Array#[] 
  2.78      0.361     0.361     0.000     0.000   499208   String#to_s 
  2.30      1.976     0.299     0.000     1.676    10000   <Class::Addressable::URI>#parse 
  1.46      0.190     0.190     0.000     0.000   240000   Kernel#instance_variable_defined? 
  1.46      0.799     0.189     0.000     0.609    10000   Addressable::URI#to_s 
  1.35      2.478     0.175     0.000     2.303    10000   <Class::Addressable::URI>#heuristic_parse 
  1.15      0.713     0.150     0.000     0.564    30000   Addressable::URI#validate 
  1.15      0.218     0.150     0.000     0.069   100000   String#== 
  1.09      0.200     0.141     0.000     0.059    70000   Addressable::URI#scheme 
  1.06      0.192     0.138     0.000     0.054    70000   Addressable::URI#host 
  1.02      8.860     0.132     0.000     8.729    10000   <Module::PublicSuffix>#parse 
  0.99      0.214     0.129     0.000     0.085    10000   Addressable::URI#host= 
  0.83      0.208     0.108     0.000     0.100    10000   Addressable::URI#scheme= 
  0.79      0.312     0.103     0.000     0.209    20000   Addressable::URI#authority 
  0.78      0.140     0.101     0.000     0.039    50000   Addressable::URI#path 
  0.72      7.877     0.093     0.000     7.784    10000   PublicSuffix::List#select 
  0.68      0.226     0.088     0.000     0.138    60000   BasicObject#!= 
  0.61      0.080     0.080     0.000     0.000    56438   String#[] 
  0.59     12.736     0.077     0.000    12.659    10000   <Class::Twingly::URL::Normalizer>#normalize_url 
  0.57      0.102     0.074     0.000     0.028    10000   Addressable::URI#path= 
  0.53      0.069     0.069     0.000     0.000    90000   Kernel#respond_to_missing? 
  0.51      0.066     0.066     0.000     0.000    20000   String#=~ 
  0.50      1.452     0.065     0.000     1.386    10000   Addressable::URI#initialize 
  0.49      0.064     0.064     0.000     0.000    10000   String#scan 
  0.48      0.072     0.062     0.000     0.010    10000   Enumerable#inject 
  0.48      0.062     0.062     0.000     0.000    10000   String#gsub! 
  0.47      0.062     0.062     0.000     0.000    30000   Array#join 
  0.47      0.097     0.061     0.000     0.036    10000   Hash#merge 
  0.45      0.239     0.059     0.000     0.180    10000   Addressable::URI#ip_based? 
  0.45      0.119     0.058     0.000     0.061    10000   PublicSuffix::Domain#subdomain? 
  0.44      0.138     0.057     0.000     0.081     6385   PublicSuffix::Rule::Base#initialize 
  0.41      1.371     0.054     0.000     1.317    10000   Addressable::URI#defer_validation 
  0.39      0.324     0.051     0.000     0.274        1   IO#each_line 
  0.37      0.048     0.048     0.000     0.000    50000   Kernel#respond_to? 
  0.36      1.896     0.047     0.000     1.849    26386  *Class#new 
  0.36      8.035     0.046     0.000     7.989    10000   PublicSuffix::List#find 
  0.35     12.796     0.046     0.000    12.750    20001  *Array#map 
  0.34      0.099     0.045     0.000     0.055    10000   Addressable::URI#userinfo 
  0.34      0.145     0.045     0.000     0.100    10000   <Class::Twingly::URL::Normalizer>#extract_urls 
  0.34      0.044     0.044     0.000     0.000    10000   Array#flatten 
  0.30      0.236     0.038     0.000     0.197     6385   <Class::PublicSuffix::Rule>#factory 
  0.29      0.037     0.037     0.000     0.000    20000   String#gsub 
  0.29      0.037     0.037     0.000     0.000    10000   Hash#keys 
  0.28      0.036     0.036     0.000     0.000    50000   Kernel#nil? 
  0.27      0.338     0.035     0.000     0.303    10000   PublicSuffix::Rule::Base#allow? 
  0.26     12.951     0.033     0.000    12.917    10000   <Class::Twingly::URL::Normalizer>#normalize 
  0.25      0.098     0.032     0.000     0.066    20000   Kernel#!~ 
  0.24      0.032     0.032     0.000     0.000    20000   PublicSuffix::Rule::Normal#parts 
  0.24      0.031     0.031     0.000     0.000    10000   Regexp#=== 
  0.24      0.061     0.031     0.000     0.031    20000   Kernel#initialize_dup 
  0.24      0.039     0.031     0.000     0.008    10000   PublicSuffix::Domain#initialize 
  0.23      0.030     0.030     0.000     0.000    40000   String#to_str 
  0.21      0.078     0.028     0.000     0.050    10000   <Class::Addressable::URI>#ip_based_schemes 
  0.21     12.978     0.028     0.000    12.951    20000  *Proc#call 
  0.19      0.030     0.024     0.000     0.006    10001   Array#each 
  0.18      0.023     0.023     0.000     0.000    20000   String#strip 
  0.17      0.022     0.022     0.000     0.000    20000   String#chomp 
  0.16      0.021     0.021     0.000     0.000    10000   Array#values_at 
  0.16      0.021     0.021     0.000     0.000    10000   Hash#initialize_copy 
  0.16      0.021     0.021     0.000     0.000    26385   Hash#has_key? 
  0.16      0.020     0.020     0.000     0.000    20000   String#include? 
  0.15      0.027     0.020     0.000     0.007    10000   Addressable::URI#password 
  0.15      0.027     0.020     0.000     0.008    10000   Addressable::URI#user 
  0.15      0.020     0.020     0.000     0.000    26385   Array#first 
  0.15      0.027     0.020     0.000     0.008    10000   Addressable::URI#fragment 
  0.15      0.028     0.020     0.000     0.008    10000   Addressable::URI#query 
  0.15      0.027     0.020     0.000     0.008    10000   Addressable::URI#port 
  0.14      0.018     0.018     0.000     0.000    20000   Kernel#kind_of? 
  0.14      0.044     0.018     0.000     0.025    10000   Kernel#dup 
  0.13      0.017     0.017     0.000     0.000     6385   PublicSuffix::List#add 
  0.13      0.016     0.016     0.000     0.000    16385   Module#name 
  0.13      0.016     0.016     0.000     0.000    16385   String#downcase 
  0.12     12.994     0.016     0.000    12.978        1   Integer#times 
  0.12      0.375     0.016     0.000     0.359    10000   <Class::PublicSuffix::List>#default 
  0.12      0.024     0.016     0.000     0.009    10000   Array#include? 
  0.11      0.014     0.014     0.000     0.000    10000   Kernel#Array 
  0.11      0.014     0.014     0.000     0.000     7820   <Class::PublicSuffix::List>#private_domains? 
  0.10      0.150     0.014     0.000     0.137     6332   PublicSuffix::Rule::Normal#initialize 
  0.10      0.013     0.013     0.000     0.000    10000   PublicSuffix::Domain#trd 
  0.10      0.013     0.013     0.000     0.000    10000   PublicSuffix::Domain#tld 
  0.10      0.013     0.013     0.000     0.000    10000   PublicSuffix::Domain#sld 
  0.10      0.013     0.013     0.000     0.000    10000   <Class::Addressable::URI>#port_mapping 
  0.10      0.013     0.013     0.000     0.000    16385   Array#last 
  0.10      0.013     0.013     0.000     0.000    16385   Kernel#class 
  0.08      0.010     0.010     0.000     0.000    10000   Array#compact 
  0.08      0.010     0.010     0.000     0.000    10000   String#initialize_copy 
  0.07      0.009     0.009     0.000     0.000    10000   String#force_encoding 
  0.07      0.009     0.009     0.000     0.000    10000   Symbol#to_proc 
  0.07      0.009     0.009     0.000     0.000    10000   Array#pop 
  0.06      0.008     0.008     0.000     0.000    10000   Kernel#is_a? 
  0.06      0.008     0.008     0.000     0.000    10001   Kernel#block_given? 
  0.06      0.007     0.007     0.000     0.000    10000   Symbol#== 
  0.06      0.007     0.007     0.000     0.000     6385   Module#const_get 
  0.05      0.007     0.007     0.000     0.000     7820   String#strip! 
  0.05      0.006     0.006     0.000     0.000     6385   String#capitalize 
  0.04      0.006     0.006     0.000     0.000     6385   String#to_sym 
  0.01      0.001     0.001     0.000     0.000      560   Hash#[]= 
  0.00      0.001     0.000     0.000     0.001       34   PublicSuffix::Rule::Wildcard#initialize 
  0.00      0.000     0.000     0.000     0.000       19   PublicSuffix::Rule::Exception#initialize 
  0.00     12.994     0.000     0.000    12.994        1   Object#measure 
  0.00      0.000     0.000     0.000     0.000        1   File#initialize 
  0.00      0.359     0.000     0.000     0.359        1   PublicSuffix::List#initialize 
  0.00      0.020     0.000     0.000     0.020        1   Enumerable#each_with_index 
  0.00      0.034     0.000     0.000     0.034        1   PublicSuffix::List#create_index! 
  0.00      0.000     0.000     0.000     0.000        1   <Class::PublicSuffix::List>#default_definition 
  0.00      0.000     0.000     0.000     0.000        1   <Class::File>#join 
  0.00      0.000     0.000     0.000     0.000        1   <Class::IO>#new 
  0.00      0.359     0.000     0.000     0.359        1   <Class::PublicSuffix::List>#parse 
  0.00      0.000     0.000     0.000     0.000        1   <Class::File>#dirname 

* indicates recursively called methods
              PASS (0:00:13.171) test: .normalize_url should normalizing a short URL (10000x). 

Finished in 13.171820 seconds.

1 tests, 1 passed, 0 failures, 0 errors, 0 skips, 0 assertions

dentarg · 2014-02-20T20:18:47Z

test/unit/normalization_test.rb

+      assert_equal [url], @normalizer.normalize(url)
+    end
+
+    should "should not blow up when there's no URL in the text" do


One "should" too much?

dentarg · 2014-02-21T08:25:00Z

From bundle exec rake in stanley using this branch for twingly-url-normalizer:

TestUrlHelper
     FAIL (0:00:00.066) test_site_url_without_scheme
          Expected: "//www.asos.com/"
            Actual: "//www.asos.com"
        @ /Users/dentarg/.rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/minitest/unit.rb:200:in `assert'
          /Users/dentarg/.rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/minitest/unit.rb:240:in `assert_equal'
          test/unit/url_helper_test.rb:10:in `test_site_url_without_scheme'
          /Users/dentarg/.rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/minitest/unit.rb:1301:in `run'
          /Users/dentarg/.rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/minitest/unit.rb:867:in `_run_anything'
          /Users/dentarg/.rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/minitest/unit.rb:1060:in `run_tests'
          /Users/dentarg/.rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/minitest/unit.rb:1047:in `block in _run'
          /Users/dentarg/.rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/minitest/unit.rb:1046:in `each'
          /Users/dentarg/.rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/minitest/unit.rb:1046:in `_run'
          /Users/dentarg/.rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/minitest/unit.rb:1035:in `run'
          /Users/dentarg/.rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/minitest/unit.rb:789:in `block in autorun'

jage · 2014-02-21T09:46:39Z

Expected: "//www.asos.com/"
Actual: "//www.asos.com"

Ok, I'll look into it.

Insert / if no path exist.

jage · 2014-02-21T15:15:21Z

So, what's next? Start using it and see where it breaks?

walro · 2014-02-21T15:16:26Z

I think so. Hopefully the tests in Zambezi and Stanley will pick any problems up :)

Tests

jage added 6 commits February 20, 2014 14:26

Add initial tests

3f1deac

Using minitest with some extras: * Turn for more informative run output * Shoulda for context and matchers Turn: https://github.com/turn-project/turn Shoulda: https://github.com/thoughtbot/shoulda

Test on Travis CI

f9a6072

Refactor .normalize

cdbab00

Remove TODO, not sure it should handle that

f6bd680

Add test note in README

39c6dca

Add failing tests

13411ef

Broken URLs found during work with Zambezi

jage reviewed Feb 20, 2014
View reviewed changes

jage added 4 commits February 20, 2014 15:15

Parse URI's with Addressable

d230e8b

Remove shoulda, just use shoulda-context

d2bbab6

shoulda includes shoulda-context and shoulda-matchers, we’re not using the matchar at this moment, so no need to pull it in (since it introduces lots of development dependencies).

Add failing umlaut tests

a01ac74

From #2

Remove Postrank::URI

19d28c6

PostRank::URI couldn’t handle umlauts. We will lose the feature to detect urls without protocol “twingly.com”, but we don’t see the need for this feature. On the plus side, lots of runtime dependencies are removed (nokogiri!).

jage self-assigned this Feb 20, 2014

jage reviewed Feb 20, 2014
View reviewed changes

Add profiling

0225448

Inspiration from elasticsearch-transport tests: https://github.com/elasticsearch/elasticsearch-ruby/blob/6f83143b8e6409a 2eaf451a4dabf2c64f25ade31/elasticsearch-transport/test/profile/client_be nchmark_test.rb

jage added 5 commits February 20, 2014 18:00

Update README with a working example

6d210a9

Add test that fails when given text without any URLs in it

dfe29e1

Remove Gemfile.lock

0e8a3cd

Should not exist in gems

Add failing test for old URL behavior

826da7d

In 19d28c6 when I removed Postrank::URI, I removed the feature that detected URLs without protocol. This commits enables tests for it again.

Detect URLs without protocol

e276dd4

Enabled the behavior removed in 19d28c6 This uses PublicSuffix and Addressable instead of Postrank::URI though. Why? Postrank::URI was very slow, this is also slow, but not quite as slow.

jage added 3 commits February 20, 2014 19:35

Refactor

36ded27

Fix performance test, should be testing normalize_url

0e0137e

Use RubyProf::MultiPrinter, create profile files in tmp

077cc17

dentarg reviewed Feb 20, 2014
View reviewed changes

Make sure we always have a path

3682905

Insert / if no path exist.

Bump to 1.0.0 since behavior has been changed

72d0521

jage added a commit that referenced this pull request Feb 21, 2014

Merge pull request #3 from twingly/tests

d345763

Tests

jage merged commit d345763 into master Feb 21, 2014

jage deleted the tests branch February 21, 2014 16:42

jage mentioned this pull request Feb 23, 2014

Add tests #1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests #3

Tests #3

jage commented Feb 20, 2014

jage Feb 20, 2014

dentarg Feb 20, 2014

jage Feb 20, 2014

dentarg Feb 20, 2014

dentarg Feb 20, 2014

dentarg Feb 20, 2014

jage Feb 20, 2014

jage Feb 20, 2014

jage Feb 20, 2014

jage Feb 20, 2014

walro Feb 20, 2014

jage commented Feb 20, 2014

walro commented Feb 20, 2014

jage commented Feb 20, 2014

walro commented Feb 20, 2014

jage commented Feb 20, 2014

dentarg Feb 20, 2014

dentarg commented Feb 21, 2014

jage commented Feb 21, 2014

jage commented Feb 21, 2014

walro commented Feb 21, 2014

Tests #3

Tests #3

Conversation

jage commented Feb 20, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jage commented Feb 20, 2014

With Postrank::URI

Without Postrank::URI

walro commented Feb 20, 2014

jage commented Feb 20, 2014

walro commented Feb 20, 2014

jage commented Feb 20, 2014

Choose a reason for hiding this comment

dentarg commented Feb 21, 2014

jage commented Feb 21, 2014

jage commented Feb 21, 2014

walro commented Feb 21, 2014