Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce string allocations in scalar_scanner #399

Merged
merged 1 commit into from
May 6, 2019

Conversation

casperisfine
Copy link

While profiling the memory usage of our app during boot, I noticed that Pysch is allocating a lot of unnecessary strings, which waste a lot of performance on GC cycles:

allocated objects by file
-----------------------------------
   3978249  /tmp/bundle/ruby/2.5.0/gems/psych-3.1.0/lib/psych/scalar_scanner.rb
   ...
   1841174  /tmp/bundle/ruby/2.5.0/gems/psych-3.1.0/lib/psych.rb
   1813674  /tmp/bundle/ruby/2.5.0/gems/psych-3.1.0/lib/psych/tree_builder.rb
   ...
   1213996  /tmp/bundle/ruby/2.5.0/gems/psych-3.1.0/lib/psych/visitors/to_ruby.rb
   ...


retained objects by file
-----------------------------------
   812516  /tmp/bundle/ruby/2.5.0/gems/psych-3.1.0/lib/psych.rb
   ...
   248190  /tmp/bundle/ruby/2.5.0/gems/psych-3.1.0/lib/psych/scalar_scanner.rb
   ...
    39445  /tmp/bundle/ruby/2.5.0/gems/psych-3.1.0/lib/psych/visitors/to_ruby.rb

Out of almost 4M objects allocated in scalar_scanner.rb only 6% are retained.

After digging a bit more, it's because the string are matched against a dozen regexps using === which sets the various magic variables with sub strings & MatchData objects even though they are never used.

Using the following simplified benchmark:

require 'memory_profiler'
require 'psych'

yaml_doc = <<-YAML
- foo
- bar
- baz
- plop
YAML

Psych.load(yaml_doc) # Preload caches etc

report = MemoryProfiler.report do
  Psych.load(yaml_doc)
end

report.pretty_print

Before:

allocated objects by file
-----------------------------------
        13  /opt/rubies/2.6.2/lib/ruby/2.6.0/psych/scalar_scanner.rb
         8  /opt/rubies/2.6.2/lib/ruby/2.6.0/psych.rb
         6  /opt/rubies/2.6.2/lib/ruby/2.6.0/psych/tree_builder.rb
         5  /opt/rubies/2.6.2/lib/ruby/2.6.0/psych/visitors/to_ruby.rb
         3  /opt/rubies/2.6.2/lib/ruby/2.6.0/psych/nodes/node.rb
         2  /opt/rubies/2.6.2/lib/ruby/2.6.0/psych/handlers/document_stream.rb
         1  (eval)
         1  /opt/rubies/2.6.2/lib/ruby/2.6.0/psych/class_loader.rb

After:

allocated objects by file
-----------------------------------
         8  /Users/byroot/src/github.com/Shopify/psych/lib/psych.rb
         6  /Users/byroot/src/github.com/Shopify/psych/lib/psych/tree_builder.rb
         5  /Users/byroot/src/github.com/Shopify/psych/lib/psych/visitors/to_ruby.rb
         3  /Users/byroot/src/github.com/Shopify/psych/lib/psych/nodes/node.rb
         2  /Users/byroot/src/github.com/Shopify/psych/lib/psych/handlers/document_stream.rb
         2  /Users/byroot/src/github.com/Shopify/psych/lib/psych/scalar_scanner.rb
         1  (eval)
         1  /Users/byroot/src/github.com/Shopify/psych/lib/psych/class_loader.rb

Downside

This rely on String#match? which is a MRI 2.4 feature, which mean it drops MRI 2.3 support.

If dropping 2.3 is a no-go, we could have two implementations of this method, and select them using String.method_defined?(:match?)

@tenderlove any opinions on this?

I'll look for other places where allocations could be reduced.

cc @csfrancis @rafaelfranca @Edouard-chin

@casperisfine casperisfine force-pushed the reduce-parsing-allocations branch from de2365a to d60a807 Compare April 26, 2019 16:29
@casperisfine
Copy link
Author

Interesting followup, after running the profile again with this patch:

allocated objects by file
-----------------------------------
...
   1574360  /tmp/bundle/ruby/2.5.0/bundler/gems/psych-de2365a3def7/lib/psych/scalar_scanner.rb


allocated objects by location
-----------------------------------
...
   604134  /tmp/bundle/ruby/2.5.0/bundler/gems/psych-de2365a3def7/lib/psych/scalar_scanner.rb:41
   389244  /tmp/bundle/ruby/2.5.0/bundler/gems/psych-de2365a3def7/lib/psych/scalar_scanner.rb:46
...
   373994  /tmp/bundle/ruby/2.5.0/bundler/gems/psych-de2365a3def7/lib/psych/scalar_scanner.rb:103
...
   109412  /tmp/bundle/ruby/2.5.0/bundler/gems/psych-de2365a3def7/lib/psych/scalar_scanner.rb:106
...
    74520  /tmp/bundle/ruby/2.5.0/bundler/gems/psych-de2365a3def7/lib/psych/scalar_scanner.rb:55

So only ~2.4 million allocations have been saved. Looking at the allocation locations they're all @string_cache[string] = true.

I initially thought it was because of ruby/ruby@1e83e15 but I'm actually running that profile on MRI 2.5, so that can't be it.

I'll dig a bit further to try to understand why this cause allocations, but I'm starting to wonder wether that string cache isn't doing more harm than good.

@casperisfine
Copy link
Author

Actually that was it: https://github.com/ruby/ruby/blob/b33a168e65c64f2d852b3911e34bd4faab451ab8/hash.c#L1585-L1600

Even on 2.5.x the String hash keys are duped and frozen, which mean foo['bar'] = true does allocate a string unless the key is already frozen.

So I do think we should get rid of @string_cache, I can only see it giving a perf boot in documents containing a lot of duplicated strings.

Thoughts on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants