-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract characterization service to allow ActiveFedora and Valkyrie filesets #4213
Conversation
4880788
to
0c68ba8
Compare
# @return [#characterize] | ||
def self.for(source:) | ||
case source | ||
when Hyrax::FileSetBehavior # ActiveFedora |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this ever be the case? FileSetBehavior is a module not a class.
module FileSetBehavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the case when passing a FileSet
and not a Hyrax::FileSet
. We see the same style here: https://github.com/samvera/hyrax/blob/master/app/services/hyrax/visibility_propagator.rb#L12. We verified this when stepping through this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcoyne yeah. @aaron-collier and I verified this yesterday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcoyne see:
2.5.5 :001 > file_set = FileSet.new
=> #<FileSet id: nil, head: [], tail: [], depositor: nil, title: [], date_uploaded: nil, date_modified: nil, alternative_title: [], label: nil, relative_path: nil, import_url: nil, resource_type: [], creator: [], contributor: [], description: [], abstract: [], keyword: [], license: [], rights_notes: [], rights_statement: [], access_right: [], publisher: [], date_created: [], subject: [], language: [], identifier: [], based_near: [], related_url: [], bibliographic_citation: [], source: [], access_control_id: nil, embargo_id: nil, lease_id: nil>
2.5.5 :002 > Hyrax::FileSetBehavior === file_set
=> true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems really unclear and easy to violate this assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use ActiveFedora::Base
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm 👍 on using ActiveFedora::Base
, with the idea that as long as we're in a position where we need to do type checks we should do the least strict type checks that will work.
in theory, folks can implement a valid AF File Set class that doesn't include Hyrax::FileSetBehavior
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have some questions and comments for you to consider. Happy to pair!
app/jobs/characterize_job.rb
Outdated
@@ -1,34 +1,10 @@ | |||
# Characterizes the file at 'filepath' if available, otherwise, pulls a copy from the repository | |||
# and runs characterization on that file. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A trivial thing: would you consider dropping the newline between the class documentation and the class
declaration?
app/jobs/characterize_job.rb
Outdated
def perform(work) | ||
Hyrax::Characterizer.for(source: work).characterize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you feel about renaming work
on lines 7 and 8 to file_set
?
@@ -7,6 +7,8 @@ module Hyrax | |||
# @see https://wiki.duraspace.org/display/samvera/Hydra%3A%3AWorks+Shared+Modeling | |||
class FileSet < Hyrax::Resource | |||
include Hyrax::Schema(:core_metadata) | |||
include Hyrax::FileSet::Characterization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both @aaron-collier and I believe this is the smallest possible set of includes to make sure resource-backed file sets respond to messages that are sent in the course of characterizing them.
Hyrax::FileSet::Characterization
is needed to set up the characterization proxy, and Hydra::Works::MimeTypes
is needed to ask a file set what sort of files it wraps (e.g., #image?
), based on mime types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on including MimeTypes
, the derivatives job also needs to know that for a resource-backed file set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if these are required FileSet
interface, let's get them into the 'a Hyrax::FileSet'
shared specs in lib/hyrax/specs/shared_specs/hydra_works.rb
?
i'm tempted to look for a way around making this the interface, though. so far we've been mostly successful in avoiding putting business logic in the models. while there's definitely space for compromise on that, it seems worth at least poking at other possibilities. any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: MimeTypes
,
Hyrax::FileSet
doesn't (yet) have a #mime_type
property, so i think the include is injecting a bunch of methods that will currently raise errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah... noticing that ::FileSet
also doesn't have a mime_type
property and just delegates it to #characterization_proxy
. it seems like we should follow suit and leave #mime_type
at the file level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering about this as a possible solution to the #mime_type
issue #4215
app/services/hyrax/characterizer.rb
Outdated
@@ -0,0 +1,19 @@ | |||
module Hyrax | |||
## | |||
# @abstract Determins which characterizer to run based on the file_set type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: Determins
## | ||
# @param source the object to characterize | ||
def initialize(source:) | ||
self.source = source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be @source = source
?
## | ||
# @return [void] | ||
# | ||
# @raise [RuntimeError] if visibility propogation fails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the text here needs to change.
Hydra::Works::CharacterizationService.run(characterization_proxy, filepath) | ||
Rails.logger.debug "Ran characterization on #{characterization_proxy.id} (#{characterization_proxy.mime_type})" | ||
characterization_proxy.alpha_channels = channels(filepath) if source.image? && Hyrax.config.iiif_image_server? | ||
persist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the above comment, I wonder about the utility of moving the persistence code into this method.
def characterization_proxy | ||
raise "#{source.class.characterization_proxy} was not found for FileSet #{source.id}" unless source.characterization_proxy? | ||
source.characterization_proxy | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment on this method as above.
end | ||
|
||
def filepath | ||
Hyrax::WorkingDirectory.find_or_retrieve(source.original_file.id, source.id.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what would you think about tossing a comment here calling out the double id call is intentional (and required)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have some test-related comments and questions as well. Thanks for considering!
spec/jobs/characterize_job_spec.rb
Outdated
FileSet.new(id: file_set_id).tap do |fs| | ||
allow(fs).to receive(:original_file).and_return(file) | ||
allow(fs).to receive(:update_index) | ||
context 'when the work is an ActiveFedora FileSet' do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a context we want to keep here? the job is not switching behavior anymore.
spec/jobs/characterize_job_spec.rb
Outdated
it "reindexes the collection" do | ||
expect(collection).to receive(:update_index) | ||
described_class.perform_now(file_set, file.id) | ||
before do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test should be removed, since the characterizejob no longer handles this behavior. I believe the test should only ensure that the characterizer factory returns a thing that #characterize
gets called on.
context 'when the characterization proxy content is absent' do | ||
before { allow(file_set).to receive(:characterization_proxy?).and_return(false) } | ||
it 'raises an error' do | ||
expect { characterizer.characterize }.to raise_error(StandardError, /original_file was not found/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if this can/should change to RuntimeError
@@ -0,0 +1,32 @@ | |||
RSpec.describe Hyrax::ResourceCharacterizer do | |||
subject(:characterizer) { described_class.new(source: resource) } | |||
let(:resource) { FactoryBot.create(:file_set).valkyrie_resource } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is dead code since each context below has its own resource let block
before { allow(resource).to receive(:characterization_proxy?).and_return(false) } | ||
|
||
it 'runs Hydra::Works::CharacterizationService and creates a CreateDerivativesJob' do | ||
expect { characterizer.characterize }.to raise_error(StandardError, /original_file was not found/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RuntimeError?
0c68ba8
to
df67e5e
Compare
df67e5e
to
2845fb5
Compare
2845fb5
to
328c6a9
Compare
subject(:characterizer) { described_class.new(source: resource) } | ||
|
||
before do | ||
allow(Hyrax::FileSet).to receive(:find).with(resource.id).and_return(resource) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can Hyrax::FileSet
do this? is this coming from one of the includes??
allow(Hyrax::FileSet).to receive(:find).with(resource.id).and_return(resource) | ||
allow(Hydra::Works::CharacterizationService).to receive(:run) | ||
allow(Hyrax.persister).to receive(:save) | ||
allow(CreateDerivativesJob).to receive(:perform_later) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this just to avoid running the job? if so, it shouldn't be needed since we use the test queue adapter by default in the test suite.
before do | ||
allow(Hyrax::FileSet).to receive(:find).with(resource.id).and_return(resource) | ||
allow(Hydra::Works::CharacterizationService).to receive(:run) | ||
allow(Hyrax.persister).to receive(:save) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay to just actually call save on the persister?
if we want to avoid round tripping wings, we could do allow(Hyrax).to receive(:metadata_adapter).and_return(Valkyrie::MetadataAdapter.find(:test_adapter))
to use the test/memory adapter for the duration.
i've considered rigging this into some RSpec metadata, but it hasn't made it's way into the codebase yet.
it 'runs Hydra::Works::CharacterizationService and creates a CreateDerivativesJob' do | ||
expect(Hydra::Works::CharacterizationService).to receive(:run).with(resource.original_file, anything) | ||
expect(CreateDerivativesJob).to receive(:perform_later).with(resource, resource.original_file.id, anything) | ||
expect(Hyrax.persister).to receive(:save).twice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
twice with_args
? or, again, consider just actually saving the objects to memory and expecting them to be saved.
# | ||
# @raise [RuntimeError] if FileSet is missing the characterization_proxy | ||
def characterize | ||
Hydra::Works::CharacterizationService.run(characterization_proxy, filepath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think about injecting this dependency?
##
# @param service [#run]
def characterize(service: Hydra::Works::CharacterizationService)
service.run(characterization_proxy, filepath)
...
end
in this case, we could inject a spy service in the test suite and check its arguments, instead of hard-stubbing the class method.
# @return [void] | ||
# | ||
# @raise [RuntimeError] if FileSet is missing the characterization_proxy | ||
def characterize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same method signature/dependency injection suggestion as https://github.com/samvera/hyrax/pull/4213/files#r370888617
Rails.logger.debug "Ran characterization on #{characterization_proxy.id} (#{characterization_proxy.mime_type})" | ||
characterization_proxy.alpha_channels = channels(filepath) if source.image? && Hyrax.config.iiif_image_server? | ||
Hyrax.persister.save(resource: characterization_proxy) | ||
Hyrax.persister.save(resource: source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this trigger a reindex of source
in some way, or is it not needed?
This issue has been automatically marked as stale because it has not had activity for 30 days. It will be closed if no further activity occurs within 14 days. Thank you for your contributions. |
Fixes #4100
This extracts the majority of the current CharacterizerJob into a Characterizer that implements individual characterize methods per FileSet type.
The pattern followed here was based on suggestion by @no-reply
Note: the remainder of characterizer_job_spec can likely be removed after #4207
@samvera/hyrax-code-reviewers