Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tap: fix performance regression in *_files_by_name #16791

Merged
merged 2 commits into from
Mar 2, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 12 additions & 10 deletions Library/Homebrew/tap.rb
Original file line number Diff line number Diff line change
Expand Up @@ -820,7 +820,7 @@

sig { returns(T::Hash[String, T::Array[String]]) }
def self.reverse_tap_migrations_renames
Tap.each_with_object({}) do |tap, hash|
cache[:reverse_tap_migrations_renames] ||= Tap.each_with_object({}) do |tap, hash|
Copy link
Member

@reitermarkus reitermarkus Mar 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apainintheneck, I remember specifically not caching this because it will never be cleared, except in tests.

Basically, every instance's #clear_cache would need to clear the global cache[:reverse_tap_migrations_renames], because it depends on all taps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that Tap.reverse_tap_migrations_renames loads each tap with Tap.each and then calls Tap#tap_migrations on each tap to build this list. Do we expect either Tap.each or Tap#tap_migrations to change over the life of the program?

Copy link
Contributor Author

@apainintheneck apainintheneck Mar 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you're worried about the case where a tap gets installed with Tap#ensure_installed! after Tap.reverse_tap_migrations_renames has been built, right? Maybe it'd just be a good idea to clear the class level cache when we do that since Tap#ensure_installed! should only very rarely need to install a new tap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we could just go back to how things are. Either way works for me.

Copy link
Member

@Bo98 Bo98 Mar 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just have the cache inside the Tap.each_with_object instead and cache by cache[:reverse_tap_migrations_renames][tap.name], or have an instance method so it's tied to instance cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I look at it a bit more the problem might be Tap.each in this case.

Screen Shot 2024-03-02 at 10 51 57 AM

This is from running brew prof -- deps --casks --eval-all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A special cache of installed tap instances that gets reset on Tap#install and Tap#uninstall would make sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds doable. I'm not sure how that could be implemented in a simple, straightforward way though. This seemed like a no-brainer but now I know it wasn't.

Copy link
Member

@MikeMcQuaid MikeMcQuaid Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A special cache of installed tap instances that gets reset on Tap#install and Tap#uninstall would make sense to me.

@Bo98 @apainintheneck @reitermarkus I think we should/could just nuke all tap caches on those methods given how infrequently they are called.

tap.tap_migrations.each do |old_name, new_name|
new_tap_user, new_tap_repo, new_name = new_name.split("/", 3)
next unless new_name
Expand Down Expand Up @@ -1211,14 +1211,16 @@
def formula_files_by_name
return super if Homebrew::EnvConfig.no_install_from_api?

tap_path = path.to_s
Homebrew::API::Formula.all_formulae.each_with_object({}) do |item, hash|
name, formula_hash = item
# If there's more than one item with the same path: use the longer one to prioritise more specific results.
existing_path = hash[name]
# Pathname equivalent is slow in a tight loop
new_path = File.join(tap_path, formula_hash.fetch("ruby_source_path"))
hash[name] = Pathname(new_path) if existing_path.nil? || existing_path.to_s.length < new_path.length
@formula_files_by_name ||= begin
tap_path = path.to_s
Homebrew::API::Formula.all_formulae.each_with_object({}) do |item, hash|
name, formula_hash = item
# If there's more than one item with the same path: use the longer one to prioritise more specific results.
existing_path = hash[name]
# Pathname equivalent is slow in a tight loop
new_path = File.join(tap_path, formula_hash.fetch("ruby_source_path"))
hash[name] = Pathname(new_path) if existing_path.nil? || existing_path.to_s.length < new_path.length
end
end
end

Expand Down Expand Up @@ -1279,7 +1281,7 @@
def cask_files_by_name
return super if Homebrew::EnvConfig.no_install_from_api?

Homebrew::API::Cask.all_casks.each_with_object({}) do |item, hash|
@cask_files_by_name ||= Homebrew::API::Cask.all_casks.each_with_object({}) do |item, hash|

Check warning on line 1284 in Library/Homebrew/tap.rb

View check run for this annotation

Codecov / codecov/patch

Library/Homebrew/tap.rb#L1284

Added line #L1284 was not covered by tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we probably should avoid Pathname here too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

name, cask_hash = item
# If there's more than one item with the same path: use the longer one to prioritise more specific results.
existing_path = hash[name]
Expand Down