Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize URLHelpers.gather_constants #1935

Merged
merged 2 commits into from
Jul 15, 2024

Conversation

amomchilov
Copy link
Contributor

@amomchilov amomchilov commented Jun 19, 2024

Motivation

Profiling tapioca DSL SomeConst identified this method as a hot spot, taking roughly 1 out of the 6 total second spent by Tapioca (not including application load time).

The changes in this PR speed up running tapioca dsl against Shopify core by 1 entire second!

Benchmark results (writeup WIP)

run gather_constants User time Syscall time Total
Master 2s 350ms 37s 383ms 6s 760ms 33s 10ms
Original improvements 1s 333ms 36s 383ms 6s 970ms 32s 216ms
Ufuk suggestion 1s 473ms 36s 593ms 6s 750ms 32s 313ms
Final improvement 1s 330ms 35s 533ms 6s 420ms 31s 27ms
Set (dumb) 1s 530ms 36s 710ms 6s 657ms 32s 308ms
Set (smart) 1s 523ms 37s 260ms 6s 770ms 32s 901ms
Benchmark run details

Master

run gather_constants User time Syscall time Total
run 1 2s 350ms 37s 420ms 6s 710ms 32s 889ms
run 2 2s 410ms 37s 210ms 6s 830ms 32s 992ms
run 3 2s 290ms 37s 520ms 6s 740ms 33s 149ms
avg 2s 350ms 37s 383ms 6s 760ms 33s 10ms

My original approach

Doesn't handle prepended modules correctly.

2344fdf

run gather_constants User time Syscall time Total
run 1 1s 470ms 36s 470ms 7s 70ms 32s 523ms
run 2 1s 500ms 36s 320ms 6s 870ms 32s 118ms
run 3 1s 30ms 36s 360ms 6s 970ms 32s 6ms
avg 1s 333ms 36s 383ms 6s 970ms 32s 216ms

Ufuk's improvement

Fixes logic for prepended modules.

70d7fb5

run gather_constants User time Syscall time Total
run 1 1s 380ms 36s 650ms 6s 770ms 32s 577ms
run 2 1s 520ms 36s 560ms 6s 750ms 32s 156ms
run 3 1s 520ms 36s 570ms 6s 730ms 32s 205ms
avg 1s 473ms 36s 593ms 6s 750ms 32s 313ms

Final improvement

See the commit message of ef5e6c9

run gather_constants User time Syscall time Total
run 1 1s 430ms 35s 510ms 6s 430ms 31s 74ms
run 2 1s 460ms 35s 520ms 6s 340ms 30s 805ms
run 3 1s 100ms 35s 570ms 6s 490ms 31s 201ms
avg 1s 330ms 35s 533ms 6s 420ms 31s 27ms

Using a Set (poorly)

Uses a Set to accelerate the .include? checks. However, since this Set isn't reused, the overhead of created it is larger than the benefit of the faster lookup.

https://github.com/Shopify/tapioca/compare/70d7fb5491fe655715da1e7f52dbd8ac1085418e..684c6d7a98a612f268f814d8f079f0c6756a216b

run gather_constants User time Syscall time Total
run 1 1s 570ms 36s 930ms 6s 760ms 32s 716ms
run 2 1s 520ms 36s 580ms 6s 650ms 32s 6ms
run 3 1s 500ms 36s 620ms 6s 560ms 32s 202ms
avg 1s 530ms 36s 710ms 6s 657ms 32s 308ms

Using a set (properly)

Ammortize the Set construction over two consecutive .include? checks

https://github.com/Shopify/tapioca/compare/70d7fb5491fe655715da1e7f52dbd8ac1085418e..c31a9ef73c59a8965ec0679440b10ee7c3e90379

run gather_constants User time Syscall time Total
run 1 1s 570ms 37s 600ms 6s 810ms 33s 341ms
run 2 1s 550ms 37s 230ms 6s 780ms 32s 985ms
run 3 1s 450ms 36s 950ms 6s 720ms 32s 378ms
avg 1s 523ms 37s 260ms 6s 770ms 32s 901ms
other toy micro-benchmarks
#!/usr/bin/ruby

#require "awesome_print"
#require "active_support/all"
require "benchmark/ips"

module HelloWorld
  def hello_world!
    puts "Hello, word!"
  end
end

class GrandParent; end

class Parent < GrandParent; end

class Child < Parent
  include HelloWorld
end

module Unrelated; end

def ancestors_of(mod) = mod.ancestors
def superclass_of(cls) = cls.superclass

def includes_helper?(mod, helper)
  superclass_ancestors = []
  
  if Class === mod
    superclass = superclass_of(mod)
    superclass_ancestors = ancestors_of(superclass) if superclass
  end
  
  ancestors = Set.new.compare_by_identity.merge(ancestors_of(mod)).subtract(superclass_ancestors)
  ancestors.any? { |ancestor| helper == ancestor }
end

def includes_helper_2?(mod, helper)
  if Class === mod
    superclass = superclass_of(mod)
    ancestors_of(mod).take_while { |a| !superclass.equal?(a) }.include?(helper)
  else
    ancestors_of(mod).include?(helper)
  end
end

def includes_helper_3?(mod, helper)
  return ancestors_of(mod).include?(helper) unless Class === mod

  superclass = superclass_of(mod)

  ancestors_of(mod).each do |a|
    return true if helper.equal?(a)
    return false if superclass.equal?(a)
  end

  false
end

def direct_ancestors_of(mod)
  if Class === mod
    superclass = superclass_of(mod)
    ancestors_of(mod).take_while { |a| !superclass.equal?(a) }
  else
    ancestors_of(mod)
  end
end

Benchmark.ips do |x|
  x.config(warmup: 1, time: 5)
  
  x.report("original") do |times|
    i = 0
    while (i += 1) < times
      includes_helper?(Child, HelloWorld)
      includes_helper?(Child, Unrelated)
    end
  end
  
  x.report("variant 2") do |times|
    i = 0
    while (i += 1) < times
      includes_helper_2?(Child, HelloWorld)
      includes_helper_2?(Child, Unrelated)
    end
  end
  
  x.report("variant 3") do |times|
    i = 0
    while (i += 1) < times
      includes_helper_3?(Child, HelloWorld)
      includes_helper_3?(Child, Unrelated)
    end
  end
  
  x.report("direct_ancestors_of") do |times|
    i = 0
    while (i += 1) < times
      direct_ancestors = direct_ancestors_of(Child)
      direct_ancestors.include?(HelloWorld)
      direct_ancestors.include?(Unrelated)
    end
  end
  
  x.compare!
end

Comment on lines +109 to +110
url_helpers_module = Rails.application.routes.named_routes.url_helpers_module
path_helpers_module = Rails.application.routes.named_routes.path_helpers_module
Copy link
Contributor Author

@amomchilov amomchilov Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracting out these local variables removes the need to do repeated constant look-ups in the hot path below.

Comment on lines 119 to 122
next false unless mod < url_helpers_module ||
mod < path_helpers_module ||
mod.singleton_class < url_helpers_module ||
mod.singleton_class < path_helpers_module
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most classes/modules don't include these modules, so it's worth adding a fast path, even if it slows the cases where the modules are included.

This roughly doubled the speed of this span, compared to not having it.

end

ancestors = Set.new.compare_by_identity.merge(ancestors_of(mod)).subtract(superclass_ancestors)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allocations and set operations here where the majority of the hot path.

lib/tapioca/dsl/compilers/url_helpers.rb Show resolved Hide resolved
lib/tapioca/dsl/compilers/url_helpers.rb Outdated Show resolved Hide resolved
@amomchilov amomchilov added the enhancement New feature or request label Jun 19, 2024
@amomchilov amomchilov self-assigned this Jun 19, 2024
mod.singleton_class < url_helpers_module ||
mod.singleton_class < path_helpers_module

includes_helper?(mod, url_helpers_module) ||
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation: this computes and traverses the ancestor chain for mod and mod.singleton_class twice each.

I tried doing it just once, like so:

mod_direct_ancestors = direct_ancestors_of(mod)

next true if mod_direct_ancestors.include?(url_helpers_module) ||
  mod_direct_ancestors.include?(path_helpers_module)
              
mod_meta_direct_ancestors = direct_ancestors_of(mod.singleton_class)
              
mod_meta_direct_ancestors.include?(url_helpers_module) ||
  mod_meta_direct_ancestors.include?(path_helpers_module)

using:

sig { params(mod: Module).returns(T::Array[Module]) }
private def direct_ancestors_of(mod)
  if Class === mod
    superclass = superclass_of(mod)
    ancestors_of(mod).take_while { |a| !superclass.equal?(a) }
  else
    ancestors_of(mod)
  end
end

But it turned out to be roughly 25% slower

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a bunch of methods returning arrays in https://github.com/Shopify/tapioca/blob/35a6a43b5469d5b03d292330449fd7b6db7593a6/lib/tapioca/runtime/reflection.rb that could benefit from returning sets instead.

@amomchilov amomchilov force-pushed the Alex/optimize-URLHelpers-gather_constants branch from 2344fdf to 95f3419 Compare July 10, 2024 21:03
@amomchilov amomchilov requested a review from paracycle July 10, 2024 21:08
@amomchilov amomchilov marked this pull request as ready for review July 10, 2024 21:10
@amomchilov amomchilov requested a review from a team as a code owner July 10, 2024 21:10
@amomchilov amomchilov requested a review from Morriar July 10, 2024 21:10
lib/tapioca/dsl/compilers/url_helpers.rb Outdated Show resolved Hide resolved
lib/tapioca/dsl/compilers/url_helpers.rb Outdated Show resolved Hide resolved
@amomchilov amomchilov requested review from paracycle and a team July 10, 2024 21:25
@amomchilov amomchilov force-pushed the Alex/optimize-URLHelpers-gather_constants branch from 95f3419 to 70d7fb5 Compare July 12, 2024 14:37
@amomchilov amomchilov requested review from a team and vinistock July 12, 2024 15:33
Copy link
Collaborator

@Morriar Morriar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for running these benchmarks and invalidating the Set idea! ❤️

`mod < url_helpers_module` gave false positives for weird classesl like `XPath`.

They could survive through that first layer of filtering, and then require special handling later, by forcing us to use `.any? { |a| helper == a}` instead of the simpler (and faster) `.include?(helper)` check.

By flipping this around, the receiver is always one of these known module types, which will always be hitting the usual `Module#>` implementation, which will filter out `XPath` early on. That frees us up to switch back to the faster `.include?(helper)` variant.
@amomchilov
Copy link
Contributor Author

I found one more trick which both simplifies the code (.any? { |a| helper == a } becomes just .include?(helper)), and speeds it up.

See the commit message of ef5e6c9 for details.

@amomchilov amomchilov merged commit f6028d6 into main Jul 15, 2024
30 checks passed
@amomchilov amomchilov deleted the Alex/optimize-URLHelpers-gather_constants branch July 15, 2024 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants