Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore module based refactor #310

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 3 additions & 14 deletions app/jobs/maintenance_tasks/task_job.rb
Original file line number Diff line number Diff line change
Expand Up @@ -26,20 +26,9 @@ def retry_on(*, **)
private

def build_enumerator(_run, cursor:)
cursor ||= @run.cursor
collection = @task.collection

case collection
when ActiveRecord::Relation
enumerator_builder.active_record_on_records(collection, cursor: cursor)
when Array
enumerator_builder.build_array_enumerator(collection, cursor: cursor)
when CSV
JobIteration::CsvEnumerator.new(collection).rows(cursor: cursor)
else
raise ArgumentError, "#{@task.class.name}#collection must be either "\
'an Active Record Relation, Array, or CSV.'
end
Comment on lines -32 to -42
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the things I like about this split implementation is that it adheres to the "open-closed" principle: it is "open to extension" (by subclassing) but "closed to modification" (we don't need to add further whens to add more collections types).

context = Task::EnumerationContext.new(cursor: cursor || @run.cursor)

@task.enumerator_builder.enumerator(context: context)
end

# Performs task iteration logic for the current input returned by the
Expand Down
56 changes: 56 additions & 0 deletions app/tasks/maintenance_tasks/active_record_task.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# frozen_string_literal: true

module MaintenanceTasks
class ActiveRecordTask < Task
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation makes the tradeoff that consumers need to know about the various subclasses, but we can make that fairly seamless with good generators and error messages.

We might even allow consumers to add their own generators (e.g. in an app with many custom tasks using the same custom enumerator builder). 🤷‍♂️

# TODO: define replacement for Task.collection, delegating to instance
# Probably simplest to make this an ActiveSupport::Concern

# TODO: specify abstract_class

EnumeratorBuilder = Struct.new(:collection) do
def enumerator(context:)
enumerator_builder.active_record_on_records(collection, cursor: context.cursor)
end
end

def enumerator(context:)
sambostock marked this conversation as resolved.
Show resolved Hide resolved
collection = self.collection
assert_relation!(collection)

EnumeratorBuilder.new(collection)
end

def collection
raise NotImplementedError, 'consumers implement this'
end

def count
collection.count
end

# Easily lends itself to adding batching
# def batch_size
# 1
# end

private

def assert_relation!(collection)
return if collection.is_a?(ActiveRecord::Relation)

raise(
ArgumentError,
"#{self.class.name}#collection must be an ActiveRecord::Relation",
)
end

# Convenience method to allow tasks define enumerators with cursors for
# compatibility with Job Iteration.
#
# @return [JobIteration::EnumeratorBuilder] instance of an enumerator
# builder available to tasks.
def enumerator_builder
JobIteration.enumerator_builder.new(nil)
end
end
end
51 changes: 51 additions & 0 deletions app/tasks/maintenance_tasks/array_task.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# frozen_string_literal: true

module MaintenanceTasks
class ActiveRecordTask < Task
sambostock marked this conversation as resolved.
Show resolved Hide resolved
# TODO: define replacement for Task.collection, delegating to instance
# Probably simplest to make this an ActiveSupport::Concern

# TODO: Specify abstract_class

EnumeratorBuilder = Struct.new(:collection) do
def enumerator(context:)
enumerator_builder.array(collection, cursor: context.cursor)
end
end

def enumerator(context:)
sambostock marked this conversation as resolved.
Show resolved Hide resolved
collection = self.collection
assert_array!(collection)

EnumeratorBuilder.new(collection)
end

def collection
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With separate sub-classes, maybe it makes sense to rename the methods:

  • ActiveRecordTask#collection -> ActiveRecordTask#relation
  • ArrayTask#collection -> ArrayTask#array

raise NotImplementedError, 'consumers implement this'
end

def count
collection.length
end

private

def assert_array!(collection)
return if collection.is_a?(Array)

raise(
ArgumentError,
"#{self.class.name}#collection must be an Array",
)
end

# Convenience method to allow tasks define enumerators with cursors for
# compatibility with Job Iteration.
#
# @return [JobIteration::EnumeratorBuilder] instance of an enumerator
# builder available to tasks.
def enumerator_builder
JobIteration.enumerator_builder.new(nil)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now firmly an implementation detail. We could implement it ourselves.

end
end
end
26 changes: 20 additions & 6 deletions app/tasks/maintenance_tasks/csv_task.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,21 @@ module MaintenanceTasks
# Base class that is inherited by the host application's Task classes for
# processing CSV files.
class CsvTask < Task
# TODO: specify abstract_class

EnumeratorBuilder = Struct.new(:collection) do
def enumerator(context:)
JobIteration::CsvEnumerator.new(collection).rows(cursor: context.cursor)
end
end

# The contents of a CSV file to be processed by a Task.
#
# @return [String] the content of the CSV file to process.
attr_accessor :csv_content

# Defines the collection to be iterated over, based on the provided CSV.
#
# @return [CSV] the CSV object constructed from the specified CSV content,
# with headers.
def collection
CSV.new(csv_content, headers: true)
def enumerator_builder(context:)
sambostock marked this conversation as resolved.
Show resolved Hide resolved
EnumeratorBuilder.new(collection)
end

# The number of rows to be processed. Excludes the header row from the count
Expand All @@ -27,5 +31,15 @@ def collection
def count
csv_content.count("\n") - 1
end

private

# Defines the collection to be iterated over, based on the provided CSV.
#
# @return [CSV] the CSV object constructed from the specified CSV content,
# with headers.
def collection
CSV.new(csv_content, headers: true)
end
end
end
23 changes: 16 additions & 7 deletions app/tasks/maintenance_tasks/task.rb
Original file line number Diff line number Diff line change
@@ -1,11 +1,19 @@
# frozen_string_literal: true

require 'maintenance_tasks/active_record_task'
require 'maintenance_tasks/array_task'
require 'maintenance_tasks/csv_task'
# TODO: require other adapters, or make this dynamic somehow

Comment on lines +3 to +7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could just move these to app/models 😁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think app/tasks would make more sense if we're planning on keeping it around (otherwise, app/models makes sense). I think I'd seen some discussion about not using app/tasks internally at all, but maybe I'm wrong.

module MaintenanceTasks
# Base class that is inherited by the host application's task classes.
class Task
extend ActiveSupport::DescendantsTracker

# TODO: Might need to be abstract_class, even if just for consistency

class NotFoundError < NameError; end
EnumerationContext = Struct.new(:cursor, keyword_init: true)

class << self
# Finds a Task with the given name.
Expand All @@ -27,7 +35,7 @@ def named(name)
# @return [Array<Class>] the list of classes.
def available_tasks
load_constants
descendants.without(CsvTask)
descendants.without(CsvTask) # TODO: Would need abstract_class check
end

# Processes one item.
Expand All @@ -39,13 +47,13 @@ def process(item)
new.process(item)
end

# Returns the collection for this Task.
# Returns the enumerator for this Task.
#
# Especially useful for tests.
#
# @return the collection.
def collection
new.collection
# @return the enumerator.
def enumerator(cursor:)
new.enumerator(cursor: cursor)
Comment on lines +54 to +56
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be old code, just making sure I understand - are we intending this to return new.enumerator_builder now, or new.enumerator_builder.enumerator(context)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this is a mistake and should just return new.enumerator_builder. Lots of moving pieces across those commits 😅

This was basically just me trying to put in the equivalent of the Task.collection method which used to be here.

end

# Returns the count of items for this Task.
Expand All @@ -71,9 +79,10 @@ def load_constants
#
# @raise [NotImplementedError] with a message advising subclasses to
# implement an override for this method.
def collection
def enumerator_builder(context:)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also nice now that custom iteration Tasks that don't care about the cursor (ie. are just streaming data from a queue) don't have to care about the cursor at all now!

Copy link
Contributor Author

@sambostock sambostock Jan 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! Though I've actually erroneously included context: here 😅

Suggested change
def enumerator_builder(context:)
def enumerator_builder

raise NotImplementedError,
"#{self.class.name} must implement `collection`."
"#{self.class.name} must implement `#{__method__}` or inherit from a class which does"
# TODO: Could make error string list available adapters
Comment on lines +84 to +85
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do something fancy here where we detect a legacy task and provide a useful error message, since if this method is reached from a class implementing collection, they're almost certainly using the legacy API.

if respond_to?(:collection)
  case collection
  when ActiveRecord::Relation
    raise NotImplementedError, 'Inherit from new ActiveRecordTask instead'
  when Array
    raise NotImplementedError, 'Inherit from new ArrayTask instead'
  else
    raise NotImplementedError, 'Migrate to new Task API for your custom collection'
  end
else
  raise NotImplementedError, "#{self.class.name} must implement `#{__method__}` or inherit from a class which does"
end

end

# Placeholder method to raise in case a subclass fails to implement the
Expand Down
54 changes: 54 additions & 0 deletions example.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
class ProductTask < MaintenanceTasks::ActiveRecordTask
# Consumers implement these method
def collection() Product.all end
def process(product) product.discontinue! end

# .count implementation provided, but could be overriden if consumer has some
# optimization or if the relation is too expensive to count

# Could easily add
# run_in_batches_of 50
# or
# def batch_size() 50 end
end

class SeedTask < MaintenanceTasks::ArrayTask
# Similar API to ActiveRecordTask
def collection() %w[Alice Bob Chantalle] end
def process(name) Person.create!(name: name) end
# Again, .count provided
end

class BetaFlagTask < MaintenanceTasks::CsvTask
# CSV infra handles everything except .process
def process(row) BetaFlag[:whatever].enable(row.shop_id) end
end

class PaymentTask < MaintenanceTasks::Task
# Consumers implement .build_enumerator instead of .collection, without
# having to know that other tasks actually also use it internally
class EnumeratorBuilder
def enumerator(context:)
Enumerator.new do |yielder|
cursor = context.cursor
loop do
page = cursor ? InvoiceAPI.fetch(after: cursor) : InvoiceAPI.fetch_all
break if page.empty?
page.each do |invoice|
cursor = invoice.id
yielder.yield(invoice, cursor)
end
end
end
end
end

def build_enumerator
sambostock marked this conversation as resolved.
Show resolved Hide resolved
Enumerator.new
sambostock marked this conversation as resolved.
Show resolved Hide resolved
end

# Again, can be implemented if appropriate
def count() InvoiceAPI.unpaid_count end

def process(invoice) invoice.pay! if invoice.unpaid? end
end