Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: memomize LinesClassifier::no_cov_line #662

Merged
merged 1 commit into from
Mar 10, 2018

Conversation

BMorearty
Copy link
Contributor

In my measurements the majority of the time in SimpleCov was being
spent in this one-line function.

In my tests on a large project at Airbnb, this change makes SimpleCov run
from 2.5x to 3.75x faster when processing results.

(From 154.2 seconds down to 60.6 seconds in one test.
From 16.6 seconds down to 4.4 seconds in another.)

Admittedly, memoizing it does change its behavior but only in the edge
case where someone runs SimpleCov twice in a single test run, changing
the +nocov_token+ between the two runs. This is such an odd edge case I
decided not to worry about it.

In my measurements the majority of the time in SimpleCov was being
spent in this one-line function.

In my tests on a large project, this change makes SimpleCov run
from 2.5x to 3.75x faster.

Admittedly, memoizing it does change its behavior but only in the edge
case where someone runs SimpleCov twice in a single test run, changing
the +nocov_token+ between the two runs. This is such an odd edge case I
decided not to worry about it.
@BMorearty BMorearty force-pushed the performance-improvement branch from 2d31d4b to 93b82b2 Compare March 9, 2018 03:35
Copy link
Collaborator

@PragTob PragTob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there,

thanks for the contribution!

Could you share a benchmark or parts of it for this? I guess you can't share the airbnb code base ;P

I'm just curious (plus like to keep benchmarks around and repeatable) it seems odd for such a little line to cause so much overhead, but as it's evaluated a lot and a new regex is instantiated every time I guess it makes sense...

I agree, the edge case is negligible / people shouldn't change there nocov tokens. I'd even debate the ability to change this tokens as not really necessary imo (and then this could even be in a constant...). But that's nod a change for now.

Thanks a bunch!

@BMorearty
Copy link
Contributor Author

Haha I'd be happy to share the airbnb code base with you if you come work here! 😛

But hopefully this will help: our single largest Ruby project has tens of thousands of files and millions of lines of code. Our CI runs it with a test splitter that splits it into 720 parallel runs on 24 machines. (!)

If we use track_files "**/*.rb", each worker calls no_cov_line 3,607,312 times.

If we don't use track_files, each worker calls no_cov_line 456,353 times. (I'm removing the call to track_files and will run that in a parallel step instead so each build worker doesn't have to do all that redundant work.)

Either way, all those constructions of RegExp objects take a heck of a long time:

Before memoizing:

[4] pry(main)> Benchmark.realtime { 3_607_312.times { SimpleCov::LinesClassifier.no_cov_line } }
=> 50.070164845
[5] pry(main)> Benchmark.realtime { 456_353.times { SimpleCov::LinesClassifier.no_cov_line } }
=> 6.306209134

After memoizing:

[4] pry(main)> Benchmark.realtime { 3_607_312.times { SimpleCov::LinesClassifier.no_cov_line } }
=> 0.446957834
[5] pry(main)> Benchmark.realtime { 456_353.times { SimpleCov::LinesClassifier.no_cov_line } }
=> 0.052228564

@BMorearty
Copy link
Contributor Author

P.S. Those benchmarks were in Ruby 2.1.8

@BMorearty
Copy link
Contributor Author

Please hold off on merging this until I resolve an issue I'm getting in our CI, although I can't repro it locally:

.../vendor/bundle/ruby/2.1.0/gems/simplecov-0.15.1.abnb/lib/simplecov/lines_classifier.rb:23:in `=~': invalid byte sequence in UTF-8 (ArgumentError)

@BMorearty
Copy link
Contributor Author

The error is happening because if line =~ self.class.no_cov_line is executing on this string from the builder gem: https://github.com/jimweirich/builder/blob/master/test/performance.rb#L17

The string is "text = "This is a test of the new xml markup. I\xF1t\xEBrn\xE2ti\xF4n\xE0liz\xE6ti\xF8n\n" * 10000\n"

I can verify this in pry:

[1] pry(main)> str = "text = \"This is a test of the new xml markup. I\xF1t\xEBrn\xE2ti\xF4n\xE0liz\xE6ti\xF8n\\n\" * 10000\n"
=> "text = \"This is a test of the new xml markup. I\xF1t\xEBrn\xE2ti\xF4n\xE0liz\xE6ti\xF8n\\n\" * 10000\n"
[2] pry(main)> str =~ /SimpleCov::LinesClassifier.no_cov_line/
ArgumentError: invalid byte sequence in UTF-8

What I'm curious about is why, on SimpleCov head, this file is being processed at all. It seems like a bug. I was previously using 0.14.1 and it did not process this file because our filters are:

    add_filter [
      "/config/",
      "/db/",
      "/script/",
      "/spec/",
      "/test/",
      "/vendor/",
    ]

and the filename is /tmp/d20180309-12273-5g2awg/monorail-solano-ghe/vendor/bundle/ruby/2.1.0/gems/builder-2.1.2/test/performance.rb. Since this filename includes the string /vendor/, simplecov should reject it before calling SimpleCov::LinesClassifier#classify.

I am familiar with the change in 0.15.1 to add support for regexp filters but it does not seem like that would affect it.

Is this a bug that was introduced after 0.14.1? Is it calling SimpleCov::LinesClassifier#classify on files that it shouldn't?

@BMorearty
Copy link
Contributor Author

BMorearty commented Mar 10, 2018

Update: feel free to merge this. The issue I'm seeing is unrelated to this change. I get the same behavior when I upgrade from 0.14.1 to 0.15.1 without this change. Haven't yet figured out why.

@PragTob
Copy link
Collaborator

PragTob commented Mar 10, 2018

@BMorearty thanks for the input. It does seem like we might have had some form of regression regarding filters but I'm unsure and honestly I don't have the most time to spend on simplecov but I might be able to investigate and repro. Similarly, maybe I get to wip up a quick benchmark for our line classification.

@PragTob
Copy link
Collaborator

PragTob commented Mar 10, 2018

Ok wipped up a quick benchmark and it looks good:

# this is master
tobi@speedy ~/github/simplecov $ ruby benchmarks/result.rb 
Warming up --------------------------------------
generating a simplecov result
                         7.000  i/100ms
Calculating -------------------------------------
generating a simplecov result
                         72.193  (± 2.8%) i/s -    364.000  in   5.048140s
tobi@speedy ~/github/simplecov $ git checkout -
M	Gemfile
Switched to branch 'BMorearty-performance-improvement'
tobi@speedy ~/github/simplecov $ ruby benchmarks/result.rb 
Warming up --------------------------------------
generating a simplecov result
                        26.000  i/100ms
Calculating -------------------------------------
generating a simplecov result
                        270.671  (± 1.8%) i/s -      1.378k in   5.092701s

Copy link
Collaborator

@PragTob PragTob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again 🎉

here, have some bunnies for your good contributions!

img_20180126_102207

This was referenced Mar 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants