-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore module based refactor #310
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not use inheritance when what we want is to change strategy. Use the strategy pattern please.
Also, the system can know which strategy needs to be used in the system, why are we asking users to specify it?
Interesting point @rafaelfranca. One difficulty I see is that the interface for each enumeration strategy isn't always the same:
Is the strategy pattern still a good choice if each strategy has different requirements? It seems to me like this would lead to two possible implementations: Creating the strategy with all contextdef strategy
ActiveRecordStrategy.new(Product.all)
end Having the strategy delegate to the task for contextdef strategy
ActiveRecordStrategy.new(self)
end
def collection
Product.all
end Is one of those what you had in mind? To me, they both seem less simple for consumers than the mixin approach.
To automatically select the strategy, we need the collection (so we can check its type), which leads the the awkwardness discussed in #307, where we either need to return a special collection, or override the method that calls |
Based on @rafaelfranca 's feedback here and the followup on Slack about changing
This would involve a change to our API ( |
I think I'm leaning towards exposing our own The enumerator builder should be an object that follows some API and that return an enumerator when asked to based on the cursor that it receives as argument. Something like: class MyEnumeratorBuilder
def initialize(collection)
@collection = collecion
end
def enumerator(cursor:)
# do what needs to be done and return an object that responds to `each` and uses the cursor
end
end Then in the task: class Task
def enumerator_builder
MyEnumeratorBuilder.new(collection)
end
end In the Task superclass: class TaskSuperclass
def enumerator_builder
collection = self.collection
case collection
when ActiveRecord::Relation
SomethingThatWillKnowHowToReturnTheActiveRecordEnumerator.new(collection)
when Array
SomethingThatWillKnowHowToReturnTheArrayEnumerator.new(collection)
when CSV
CsvEnumeratorBuilder.new(collection)
else
raise ArgumentError, "#{self.class.name}#collection must be either "\
'an Active Record Relation, an Array, or a CSV.'
end
end It is similar to what you got in #307, but instead of passing any context to the task about the job-iteration enumerator builder and the curse we have our own API for enumerator builders. |
@rafaelfranca your idea is indeed pretty similar to #307, though I had chosen Do you foresee a case in which we would have to pass However, if we passed in an object which responds to EnumeratorContext = Struct.new(:cursor, keyword_init: true) |
I also have strong opinions that we should not pass the Job Iteration's The other argument that we may need in the future is the job so we can build the enumerators. I think passing the context object can make easy to add that argument later when we need to. |
9074b99
to
809c749
Compare
809c749
to
3e19856
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrianna-chang-shopify, @etiennebarrie as discussed in our meeting earlier today, I have altered the implementation to use classes instead of modules. I have not fully fleshed it out, as the purpose of this PR is just to discuss alternatives.
I have added 3 commits, each making one of the discussed changes:
- Use sub-classes instead of modules:
This adopts the class hierarchy we discussed:I didn't bother implementing any of theclass Task; end class ActiveRecordTask < Task; end class ArrayTask < Task; end class CsvTask < Task; end
abstract_class
code, but it would obviously be needed. - Use EnumerationContext instead of cursor: keyword
This passescontext:
intoenumerator
instead ofcursor
directly, allowing for forwards compatibility without breaking the API. I used aStruct
for simplicity, but this could obviously change. - Add EnumeratorBuilder abstraction
This implements @rafaelfranca's suggestion from Explore module based refactor #310 (comment), changing the API slightly:The returned object is expected to respond to-def enumerator(context:) +def enumerator_builder
enumerator(context:)
, which allows us to more closely guard the concept of acursor
(unlike JobIteration, which requires consumers to know about it, as discussed in Add support for dynamic collections #307 (comment)).
Each commit also updates example.rb
so the changes to consumer usage are evident.
Let me know what you think!
case collection | ||
when ActiveRecord::Relation | ||
enumerator_builder.active_record_on_records(collection, cursor: cursor) | ||
when Array | ||
enumerator_builder.build_array_enumerator(collection, cursor: cursor) | ||
when CSV | ||
JobIteration::CsvEnumerator.new(collection).rows(cursor: cursor) | ||
else | ||
raise ArgumentError, "#{@task.class.name}#collection must be either "\ | ||
'an Active Record Relation, Array, or CSV.' | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the things I like about this split implementation is that it adheres to the "open-closed" principle: it is "open to extension" (by subclassing) but "closed to modification" (we don't need to add further when
s to add more collections types).
# frozen_string_literal: true | ||
|
||
module MaintenanceTasks | ||
class ActiveRecordTask < Task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation makes the tradeoff that consumers need to know about the various subclasses, but we can make that fairly seamless with good generators and error messages.
We might even allow consumers to add their own generators (e.g. in an app with many custom tasks using the same custom enumerator builder). 🤷♂️
EnumeratorBuilder.new(collection) | ||
end | ||
|
||
def collection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With separate sub-classes, maybe it makes sense to rename the methods:
ActiveRecordTask#collection
->ActiveRecordTask#relation
ArrayTask#collection
->ArrayTask#array
# @return [JobIteration::EnumeratorBuilder] instance of an enumerator | ||
# builder available to tasks. | ||
def enumerator_builder | ||
JobIteration.enumerator_builder.new(nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now firmly an implementation detail. We could implement it ourselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple questions / comments around the API (I think there might be some old code that didn't get updated, and just want to make sure I have my head fully wrapped around the Task interfaces, especially with the EnumeratorBuilder
abstraction).
Other than that, this is definitely the approach I like the best! As we discussed in our meeting yesterday, it has a lot of benefits, most of which you've mentioned already in this PR, but I'll just recap:
- The open-closed principle for Task types is a huge benefit, and also provides a natural extension point for collection-specific features like batches (implementing something like batches with our current model would be awkward with a single generic
Task
class) - Our framework can do more for our users now that we have Task-specific implementations (for example, handling
#count
inside the appropriate Task subclasses) - These changes also make it easy for us to detach ourselves completely from JobIteration's
enumerator_builder
, which we could just build ourselves now if we wanted like Sam mentioned - We've also detangled our API completely from
cursor
As Sam noted, the main tradeoff here is forcing users to think about and specify which Task subclass they want to implement. However, I don't really think this is a problem, because:
- We already call out the specific types of "collections" a Task can define in the README and with our error messaging, so users already need to be aware of this detail
- We're in a weird situation right now where we say that a Task can implement a collection that is an Array, ActiveRecord::Relation, or CSV. But the reality is, a generic Task can only implement an Array or ActiveRecord::Relation, and a CsvTask is what needs to be used for implementing a CSV collection. So we're already highlighting different types of Tasks through inheritance, and IMO it makes a lot of sense to put Tasks that iterate over arrays and Tasks that iterate over ActiveRecords on the same level as Tasks that iterate over CSVs
We discussed yesterday implementing this solution, with the intent of shipping it to a v2.0 of the framework (since our changes result in a new API). We can work on developing a migration strategy for existing users to get to 2.0, but ultimately @etiennebarrie, @sambostock and I are convinced this is the route we want to go down.
@rafaelfranca can you let us know what your thoughts are?
require 'maintenance_tasks/active_record_task' | ||
require 'maintenance_tasks/array_task' | ||
require 'maintenance_tasks/csv_task' | ||
# TODO: require other adapters, or make this dynamic somehow | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we could just move these to app/models
😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think app/tasks
would make more sense if we're planning on keeping it around (otherwise, app/models
makes sense). I think I'd seen some discussion about not using app/tasks
internally at all, but maybe I'm wrong.
app/tasks/maintenance_tasks/task.rb
Outdated
@@ -71,9 +79,10 @@ def load_constants | |||
# | |||
# @raise [NotImplementedError] with a message advising subclasses to | |||
# implement an override for this method. | |||
def collection | |||
def enumerator_builder(context:) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also nice now that custom iteration Tasks that don't care about the cursor (ie. are just streaming data from a queue) don't have to care about the cursor at all now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! Though I've actually erroneously included context:
here 😅
def enumerator_builder(context:) | |
def enumerator_builder |
# @return the enumerator. | ||
def enumerator(cursor:) | ||
new.enumerator(cursor: cursor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be old code, just making sure I understand - are we intending this to return new.enumerator_builder
now, or new.enumerator_builder.enumerator(context)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think this is a mistake and should just return new.enumerator_builder
. Lots of moving pieces across those commits 😅
This was basically just me trying to put in the equivalent of the Task.collection
method which used to be here.
Co-authored-by: Adrianna Chang <[email protected]>
Lots of moving pieces across the refactors. 😅 Thanks for catching the various things I missed @adrianna-chang-shopify! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The migration path for consumers should be fairly straightforward:
-class ProductTask < Task
+class ProductTask < ActiveRecordTask
def collection() Product.all end
def process(item) puts(item) end
end
-class NameTask < Task
+class NameTask < ArrayTask
def collection() Product.all end
def process(item) puts(item) end
end
class ShopIdTask < CsvTask
def process(item) puts(item) end
end
Tasks with ActiveRecord::Relation
or Array
collections would need to change their superclass, but CSV
collection tasks already inherit from CsvTask
. The support for custom collections would be a net new feature, so there's no migration path needed there.
I don't think there would be anything else, other than if someone had done heavy customization to their TaskJob
. As far as "breaking changes" go, it's fairly small, IMO.
"#{self.class.name} must implement `#{__method__}` or inherit from a class which does" | ||
# TODO: Could make error string list available adapters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do something fancy here where we detect a legacy task and provide a useful error message, since if this method is reached from a class implementing collection
, they're almost certainly using the legacy API.
if respond_to?(:collection)
case collection
when ActiveRecord::Relation
raise NotImplementedError, 'Inherit from new ActiveRecordTask instead'
when Array
raise NotImplementedError, 'Inherit from new ArrayTask instead'
else
raise NotImplementedError, 'Migrate to new Task API for your custom collection'
end
else
raise NotImplementedError, "#{self.class.name} must implement `#{__method__}` or inherit from a class which does"
end
My thoughts are still the same. We should not use inheritance to change behavior by default. We can already infer the behavior based on the collection. Why we do need to ask people to tell us something that we can know already? I'm ok with people that want custom enumerators using polymorphism to change the enumerator in their tasks, but the cases we support in this gem should not use any different superclass or include any new module (which is a different form of inheritance). What I'm saying that we should have only one "abstract" class (possibly two with CSVTask but I still think we can remove that class) and only concrete classes in the applications. |
Superseded by #944, so I am going to close this for now. |
@adrianna-chang-shopify and I paired on exploring the idea of a module based refactor to see if it would make it easier to provide a simple API to consumers for unsupported enumerator types. See discussion in #307.
This would be a breaking change, albeit one with a fairly straightforward migration path:
(In fact, if we just reused the existing CSV module, that change wouldn't be breaking)
The tradeoff this makes is that the consumer needs to choose the appropriate module to include, instead of us inferring the collection type.
If this approach makes sense, it can be fleshed out.