Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the task preloading process configurable #867

Closed
wants to merge 1 commit into from

Conversation

davidstosik
Copy link
Contributor

@davidstosik davidstosik commented Aug 18, 2023

In ordered to be displayed in the UI, tasks have to be loaded by the time MaintenanceTasks::Task.descendants is called.
To achieve that, the Task.load_constants method was previously implemented, and called from Task.available_tasks.

A few inconvenients of that method are:

  • It could easily be abused and used in lower levels of the gem, where ideally it should not be necessary to load all tasks to achieve the gem's goal. See for example my previous PR Avoid loading all tasks to validate a Run#task_name.

  • It can introduce unnecessary overhead. For example, displaying the CLI's basic help message with maintenance_tasks help will necessitate all tasks to be loaded (even though the help message does not display anything related to the available tasks), and this loading time will delay the appearance of that help message.
    Similarly, a single test case that wants to test a single task's behaviour, while never displaying the whole list of available tasks, would be negatively impacted by a misuse of Task.available_tasks.

  • It does not allow an application to define an alternative way to load all constants. As a matter of fact, the Shopify Core monolith monkey-patched that method to allow multiple "task modules" instead of a single one. With a configurable task_loader, monkey-patching would not be necessary. (Which is good. 🙈)

This PR introduces MaintenanceTasks.task_loader, a configuration that allows applications to change the logic applied to ensure all task are loaded before being displayed.
As a bonus:

  • it changes how the CLI generates the long description containing all task names, so that tasks get loaded at the last minute, only when necessary
  • it gets rid of Task.available_tasks (its only use was for UI purposes, so I changed it to a direct call to Task.descendants)
  • it introduces some tests of the loading behaviour

Please note that this PR is built on top of #866 as it is a requirement.

In ordered to be displayed in the UI, tasks have to be loaded by the
time `MaintenanceTasks::Task.descendants` is called.
To achieve that, the `Task.load_constants` method was previously
implemented, and called from `Task.available_tasks`.

A few inconvenients of that method are:

- It could easily be abused and used in lower levels of the gem, where
  ideally it should ne be necessary to load all tasks to achieve the
  gem's goal. See for example my previous PR
  [Avoid loading all tasks to validate a
`Run#task_name`](#866).

- It can introduce unnecessary overhead. For example, displaying the
  CLI's basic help message with `maintenance_tasks help` will
  necessitate all tasks to be loaded (even though the help message does
  not display anything related to the available tasks), and this loading
  time will delay the appearance of that help message.
  Similarly, a single test case that want to test a single task's
  behaviour, while never displaying the whole list of available tasks,
  would be negatively impacted by a misuse of `Task.available_tasks`.

- It does not allow an application to define an alternative way to load
  all constants. As a matter of fact, the Shopify Core monolith
  monkey-patched that method to allow multiple "task modules" instead of
  a single one. With a configurable `task_loader`, monkey-patching would
  not be necessary. (Which is good. 🙈)

This PR introduces `MaintenanceTasks.task_loader`, a configuration that
allows applications to change the logic applied to ensure all task are
loaded before being displayed. As a bonus:

- it changes how the CLI generates the long description containing all
  task names, so that tasks get loaded at the last minute, only when
  necessary
- it gets rid of `Task.available_tasks` (its only use was for UI
  purposes, so I changed it to a direct call to `Task.descendants`)
- it introduces some tests of the loading behaviour
class DefaultTaskLoaderTest < ActiveSupport::TestCase
include ActiveSupport::Testing::Isolation

test ".load_all loads all autoloadable Task subclasses within the Maintenance namespace" do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new test fails in Rails 6.
Has it possibly identified an issue with the gem in older Rails versions? I haven't altered the recursive loading algorithm, just moved it from Task.load_constants to DefaultTaskLoader.load_constants. 🤔

Copy link
Contributor Author

@davidstosik davidstosik Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failure:
MaintenanceTasks::DefaultTaskLoaderTest#test_.load_all_loads_all_autoloadable_Task_subclasses_within_the_Maintenance_namespace [/home/runner/work/maintenance_tasks/maintenance_tasks/vendor/bundle/ruby/3.0.0/gems/activesupport-6.0.6.1/lib/active_support/inflector/methods.rb:284]:
unexpected invocation: Maintenance.const_get("Nested")
unsatisfied expectations:
- expected exactly once, invoked never: Maintenance.const_get(:Nested)
- expected exactly once, invoked never: Maintenance.const_get(:ImportPostsTask)
- expected exactly once, invoked never: Maintenance.const_get(:ErrorTask)
- expected exactly once, invoked never: Maintenance.const_get(:EnqueueErrorTask)
- expected exactly once, invoked never: Maintenance.const_get(:CancelledEnqueueTask)
- expected exactly once, invoked never: Maintenance.const_get(:CallbackTestTask)
- expected exactly once, invoked never: Maintenance.const_get(:BatchImportPostsTask)

Wonder if it's just because of a different order... 🤔

Update: nothing's broken, it's just that the order in which I define the call expects matters, and different versions of Rails produce different orders.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like String vs Symbol, "Nested" vs :Nested, no?

Copy link
Contributor Author

@davidstosik davidstosik Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the order returned by .constants seems to differ depending on the version of Rails.
This would work in Ruby 6.1 for example:

diff --git i/app/models/maintenance_tasks/default_task_loader.rb w/app/models/maintenance_tasks/default_task_loader.rb
index 14f1624..fed0ea5 100644
--- i/app/models/maintenance_tasks/default_task_loader.rb
+++ w/app/models/maintenance_tasks/default_task_loader.rb
@@ -15,7 +15,8 @@ module MaintenanceTasks
       private
 
       def load_constants(namespace)
-        namespace.constants.each do |name|
+        namespace.constants.sort.each do |name|
           object = namespace.const_get(name)
           load_constants(object) if object.instance_of?(Module)
         end
diff --git i/test/models/maintenance_tasks/default_task_loader_test.rb w/test/models/maintenance_tasks/default_task_loader_test.rb
index 8fd8ef9..b029518 100644
--- i/test/models/maintenance_tasks/default_task_loader_test.rb
+++ w/test/models/maintenance_tasks/default_task_loader_test.rb
@@ -8,6 +8,9 @@ module MaintenanceTasks
 
     test ".load_all loads all autoloadable Task subclasses within the Maintenance namespace" do
       [
+        Maintenance::Nested::NestedMore::NestedMoreTask,
+        Maintenance::Nested::NestedTask,
+        Maintenance::Nested::NestedMore,
         Maintenance::BatchImportPostsTask,
         Maintenance::CallbackTestTask,
         Maintenance::CancelledEnqueueTask,
@@ -15,9 +18,6 @@ module MaintenanceTasks
         Maintenance::ErrorTask,
         Maintenance::ImportPostsTask,
         Maintenance::Nested,
-        Maintenance::Nested::NestedMore,
-        Maintenance::Nested::NestedMore::NestedMoreTask,
-        Maintenance::Nested::NestedTask,
         Maintenance::NoCollectionTask,
         Maintenance::ParamsTask,
         Maintenance::TestTask,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is blowing up before we even attempt to load the tasks:

test ".load_all loads all autoloadable Task subclasses within the Maintenance namespace" do
  [
    ...
   ].each do |constant|
    module_parent = constant.module_parent
    module_parent
      .expects(:const_get)
      .with(constant.name.demodulize.to_sym)
      .returns(constant)
  end

  puts "load all tasks!!!"

  DefaultTaskLoader.load_all
end
➜  maintenance_tasks git:(sto/configurable-task-loader) ✗ BUNDLE_GEMFILE=gemfiles/rails_6_1.gemfile be rake test TEST=test/models/maintenance_tasks/default_task_loader_test.rb
Run options: --seed 29484

# Running:

F

Failure:
MaintenanceTasks::DefaultTaskLoaderTest#test_.load_all_loads_all_autoloadable_Task_subclasses_within_the_Maintenance_namespace [/Users/adriannachang/.gem/ruby/3.2.1/gems/activesupport-6.1.7.6/lib/active_support/inflector/methods.rb:287]:
unexpected invocation: Maintenance.const_get("Nested")
unsatisfied expectations:
- expected exactly once, invoked never: Maintenance.const_get(:Nested)
- expected exactly once, invoked never: Maintenance.const_get(:ImportPostsTask)
- expected exactly once, invoked never: Maintenance.const_get(:ErrorTask)
- expected exactly once, invoked never: Maintenance.const_get(:EnqueueErrorTask)
- expected exactly once, invoked never: Maintenance.const_get(:CancelledEnqueueTask)
- expected exactly once, invoked never: Maintenance.const_get(:CallbackTestTask)
- expected exactly once, invoked never: Maintenance.const_get(:BatchImportPostsTask)



rails test test/models/maintenance_tasks/default_task_loader_test.rb:9

I think the issue is that, in setting up the expectations for the nested task constants (Maintenance::Nested::NestedMore, Maintenance::Nested::NestedMore::NestedMoreTask, etc.), we end up performing a const_get on Maintenance::Nested when we do constant.module_parent, which triggers the expectation on Maintenance::Nested that's just been set.

Copy link
Contributor Author

@davidstosik davidstosik Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Note it only blows up on Rails 6.1. :)
I managed to fix it (for Rails 6.1) by changing the order, but obviously it's gonna start failing in Rails 7.
Ideally I'd want to write expectations that do not care about the order in which const_get calls happen, but I haven't figured it out...

I think what I really need is something Mocha doesn't provide: spies.
I want to spy on const_get calls without changing their original behaviours.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right -- sorry I thought the issue was more complex than it was, but it really is just whether the Maintenance::Nested expectation is set before or after the other Nested constant expectations...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually @davidstosik if I change the order to what you proposed in #867 (comment), it passes against all the Rails versions for me? Were you seeing failures locally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrianna Huh, wow, this is weird, you seem to be right. Although both orders work with Rails 7, it seems only a specific order works with Rails 6.1?

For fun (and because I hadn't imagined the right order would work for all supported Rails versions), I've also implemented a spy. Here it is, just to show what it could look like:

# frozen_string_literal: true

require "test_helper"

module MaintenanceTasks
  class DefaultTaskLoaderTest < ActiveSupport::TestCase
    class ConstGetSpy < SimpleDelegator
      cattr_reader :calls, default: Hash.new { |h, k| h[k] = [] }

      def const_get(name)
        calls[spied_module] << name

        return_value = super

        if return_value.is_a?(Module)
          self.class.new(return_value)
        else
          return_value
        end
      end

      def instance_of?(klass)
        spied_module.instance_of?(klass)
      end

      private

      def spied_module
        __getobj__
      end
    end

    SPIED_MAINTENANCE_TASKS_MODULE = ConstGetSpy.new(Maintenance)

    test ".load_all loads all autoloadable Task subclasses within the Maintenance namespace" do
      original_tasks_module = MaintenanceTasks.tasks_module
      MaintenanceTasks.tasks_module = "MaintenanceTasks::DefaultTaskLoaderTest::SPIED_MAINTENANCE_TASKS_MODULE"

      expected_calls_to_const_get = {
        Maintenance => [
          :BatchImportPostsTask,
          :CallbackTestTask,
          :CancelledEnqueueTask,
          :EnqueueErrorTask,
          :ErrorTask,
          :ImportPostsTask,
          :Nested,
          :NoCollectionTask,
          :ParamsTask,
          :TestTask,
          :UpdatePostsInBatchesTask,
          :UpdatePostsModulePrependedTask,
          :UpdatePostsTask,
          :UpdatePostsThrottledTask,
        ],
        Maintenance::Nested => [
          :NestedMore,
          :NestedTask,
        ],
        Maintenance::Nested::NestedMore => [
          :NestedMoreTask,
        ]
      }

      DefaultTaskLoader.load_all

      assert_equal(expected_calls_to_const_get, ConstGetSpy.calls.transform_values(&:sort))

      MaintenanceTasks.tasks_module = original_tasks_module
    end
  end
end

(I saw your more general comment about the PR, will reply next. 🙇🏻‍♂️ )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat, thanks for sharing!! ❤️

Base automatically changed from sto/task-name-validate to main August 23, 2023 14:13
Copy link
Contributor

@adrianna-chang-shopify adrianna-chang-shopify left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes seem reasonable, but I'm wary of adding new configuration options that the majority of users won't need 😅 In the README we've added:

You could for example browse for tasks in a given directory:

I feel like this might encourage users to deviate from placing tasks under the tasks_module namespace, and I'm not sure that's a good thing..?

I'd like to understand a bit more why we think this config is necessary in the gem.
Playing devil's advocate for a sec! On the subject of misusing available_tasks at lower levels of the gem -- my hope would be that we would scrutinize any proposed changes involving available_tasks, given that the method clearly incurs the cost of having to load all of the constants. The task_name validation was not ideal, but introducing a custom loader seems like an overly defensive response to us finding a better solution to that validation.

For example, displaying the CLI's basic help message

We need to load all the tasks to display the help message anyways though, don't we? We could use the define_singleton_method trick you've applied in this PR while keeping#available_tasks, right?

I think the monkey patch argument is a fair one, and this would also allow apps using the classic autoloader to use the gem (though most apps should be on Zeitwerk by now).

Overall, I'm okay with these changes if we think they're justified by getting rid of the patch in Core. I'm not convinced that preventing misuse of #available_tasks in the gem is necessary, but I could have my mind changed. 😄

@etiennebarrie
Copy link
Member

this would also allow apps using the classic autoloader to use the gem (though most apps should be on Zeitwerk by now).

We dropped support for the classic autoloader in #620 released in 2.0.

Also let's keep in mind that Task.available_tasks is a public API so I'm not super motivated to bump the major to remove it.

@davidstosik Ultimately didn't #866 fix the main issue you had with available_tasks?

Similar point to Adrianna, but maybe we could keep the stuff around the Thor command to avoid loading all the tasks unless necessary to display the help for perform.

@davidstosik
Copy link
Contributor Author

davidstosik commented Aug 24, 2023

and I'm not sure that's a good thing..?

If Shopify Core has the need to define more than a single module to contain tasks, then I can imagine other apps having a similar need? They might even have needs around never trying to load all tasks in all modules at once, depending on how they've implemented modularity and how the application runs. I get that theoretical arguments hold very little weight when discussing the need for a change (YAGNI), I feel however that ourselves having that needs shows a practical use case (which I'll expand on a bit later).

We need to load all the tasks to display the help message anyways though, don't we?

Not when I do bundle exec maintenance_tasks or bundle exec maintenance_tasks perform my_task_name.

We could use the define_singleton_method trick you've applied in this PR while keeping#available_tasks, right?

Sure, that could totally work. To be honest, this change came after the fact, when I noticed that my earlier change would not actually prevent the CLI from loading all tasks when it doesn't need to, when I wrote tests. 😅

Yes, #866 fixed the main problem that I had, which was "the gem shouldn't zealously load all tasks from the application, possibly triggering heavy file requiring, unless it absolutely needs to, which is when it is about to show the whole list".

The remaining problem I was trying to "solve" with this PR, as I told @etiennebarrie and as @adrianna-chang-shopify guessed, was that Task.available_tasks is public API, and I feel the risk for it to become used again in the future, in a place where I'd rather have it not used, is not negligible.

When removing it, by extracting the loading logic to a TaskLoader (which wasn't named Default.. yet), and calling that directly in the two UI scenarios that need it, I realized it made it fairly simple to make the loading logic configurable, which in my opinion, was a good thing, as it would allow us to remove a monkey patch from our code base.

(We don't have to get into a debate on monkey patches here, but I'd like to mention that not noticing that monkey patch and reading the gem's code directly cost @shioyama and I some time when we were pairing on Maintenance Tasks related tests in Shopify Core, and I've been advised in the past against writing monkey patches for gems we own.)

Now if making the loading logic configurable is not wanted, and we're happy with keeping a monkey patch in Shopify Core, then I'll be happy myself with adjusting this PR so that:

  • it gets rid of Task.available_tasks
  • it outsources the task loading logic to TaskLoader
  • it calls the TaskLoader where it's necessary

If outsourcing the logic as described above is not acceptable, I'd also be content with renaming available_tasks to a name that carries meaning of its resource-intensiveness (something like load_all_tasks).

We don't need that change to happen on any timeline to continue our work on the Shopify Core side (thanks for merging #866 and cutting a new release!), so it'd be fine for this PR to go dormant or be merged into a future major version branch with no expected timeline. I just think it's a healthy change.

@adrianna-chang-shopify
Copy link
Contributor

@davidstosik thank you for expanding on your thoughts ❤️

Not when I do bundle exec maintenance_tasks or bundle exec maintenance_tasks perform my_task_name.

Yep, sorry, I meant that regardless of whether we extract a configurable loader, we still need to load all the tasks to show the long version of the perform command. Either way, I think redefining that command to delay loading the task names is a great improvement 👍

I've been advised in the past against writing monkey patches for gems we own.

I definitely understand your point here (and usually feel the same way). I think the challenge here is striking a balance between designing the gem in a way that avoids us having to write monkey patches in Core, while also ensuring that the gem remains simple and easy-to-use for the majority of our community, without a ton of knobs that can be configured solely because our monolith needs them. Core has a ton of custom stuff around maintenance tasks (the whole CPT system, custom controllers and views, etc.) so overriding #available_tasks seems minor in comparison.

I think it's also worth mentioning that in Core, we need that patch because we're allowing tasks to be nested in components (reusable tasks and all that), but that the goal is to go back to having all tasks in the top-level maintenance namespace (which should remove the need for the patch).

I'm not sure we want to remove .available_tasks, but I'd be okay with renaming it to something more meaningful, as you've suggested. Also in favour of that tweak to the CLI (which we can tackle separately) if you're up for it 😄

Let me know if this makes sense, or if you have other questions or concerns.

@davidstosik
Copy link
Contributor Author

davidstosik commented Aug 29, 2023

@adrianna-chang-shopify Thank you for your comment!

I have a few ideas:

  • I'll open a separate PR to improve the CLI (update: here's the PR)
  • we could rename Task.available_tasks to Task.load_all and keep the old method with a deprecation message
  • we could also have the method warn about what it does (like ".available_tasks method will load all tasks, which could be heavy"), and cancel that warning by calling Task.available_tasks(warn: false)

What do you think?

@adrianna-chang-shopify
Copy link
Contributor

@davidstosik thank you for being open! 🙇‍♀️ On board for 1 + 2 -- unsure about warning (I think changing the method name would be enough) but we can discuss in a follow up.

@davidstosik
Copy link
Contributor Author

Closing this in favor of #877 and #878.

@davidstosik davidstosik deleted the sto/configurable-task-loader branch August 29, 2023 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants