Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract characterization service to allow ActiveFedora and Valkyrie filesets #4213

Closed
wants to merge 13 commits into from

Conversation

aaron-collier
Copy link
Contributor

@aaron-collier aaron-collier commented Jan 24, 2020

Fixes #4100

This extracts the majority of the current CharacterizerJob into a Characterizer that implements individual characterize methods per FileSet type.

The pattern followed here was based on suggestion by @no-reply

Note: the remainder of characterizer_job_spec can likely be removed after #4207

@samvera/hyrax-code-reviewers

@aaron-collier aaron-collier force-pushed the characterization_service branch from 4880788 to 0c68ba8 Compare January 24, 2020 19:13
# @return [#characterize]
def self.for(source:)
case source
when Hyrax::FileSetBehavior # ActiveFedora
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this ever be the case? FileSetBehavior is a module not a class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the case when passing a FileSet and not a Hyrax::FileSet. We see the same style here: https://github.com/samvera/hyrax/blob/master/app/services/hyrax/visibility_propagator.rb#L12. We verified this when stepping through this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcoyne yeah. @aaron-collier and I verified this yesterday.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcoyne see:

2.5.5 :001 > file_set = FileSet.new
 => #<FileSet id: nil, head: [], tail: [], depositor: nil, title: [], date_uploaded: nil, date_modified: nil, alternative_title: [], label: nil, relative_path: nil, import_url: nil, resource_type: [], creator: [], contributor: [], description: [], abstract: [], keyword: [], license: [], rights_notes: [], rights_statement: [], access_right: [], publisher: [], date_created: [], subject: [], language: [], identifier: [], based_near: [], related_url: [], bibliographic_citation: [], source: [], access_control_id: nil, embargo_id: nil, lease_id: nil> 

2.5.5 :002 > Hyrax::FileSetBehavior === file_set
 => true 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems really unclear and easy to violate this assumption.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use ActiveFedora::Base instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm 👍 on using ActiveFedora::Base, with the idea that as long as we're in a position where we need to do type checks we should do the least strict type checks that will work.

in theory, folks can implement a valid AF File Set class that doesn't include Hyrax::FileSetBehavior, right?

Copy link
Member

@mjgiarlo mjgiarlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have some questions and comments for you to consider. Happy to pair!

@@ -1,34 +1,10 @@
# Characterizes the file at 'filepath' if available, otherwise, pulls a copy from the repository
# and runs characterization on that file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A trivial thing: would you consider dropping the newline between the class documentation and the class declaration?

Comment on lines 7 to 8
def perform(work)
Hyrax::Characterizer.for(source: work).characterize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you feel about renaming work on lines 7 and 8 to file_set?

@@ -7,6 +7,8 @@ module Hyrax
# @see https://wiki.duraspace.org/display/samvera/Hydra%3A%3AWorks+Shared+Modeling
class FileSet < Hyrax::Resource
include Hyrax::Schema(:core_metadata)
include Hyrax::FileSet::Characterization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both @aaron-collier and I believe this is the smallest possible set of includes to make sure resource-backed file sets respond to messages that are sent in the course of characterizing them.

Hyrax::FileSet::Characterization is needed to set up the characterization proxy, and Hydra::Works::MimeTypes is needed to ask a file set what sort of files it wraps (e.g., #image?), based on mime types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on including MimeTypes, the derivatives job also needs to know that for a resource-backed file set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if these are required FileSet interface, let's get them into the 'a Hyrax::FileSet' shared specs in lib/hyrax/specs/shared_specs/hydra_works.rb?

i'm tempted to look for a way around making this the interface, though. so far we've been mostly successful in avoiding putting business logic in the models. while there's definitely space for compromise on that, it seems worth at least poking at other possibilities. any ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: MimeTypes,

Hyrax::FileSet doesn't (yet) have a #mime_type property, so i think the include is injecting a bunch of methods that will currently raise errors?

Copy link
Contributor

@no-reply no-reply Jan 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah... noticing that ::FileSet also doesn't have a mime_type property and just delegates it to #characterization_proxy. it seems like we should follow suit and leave #mime_type at the file level?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering about this as a possible solution to the #mime_type issue #4215

@@ -0,0 +1,19 @@
module Hyrax
##
# @abstract Determins which characterizer to run based on the file_set type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Determins

app/services/hyrax/characterizer.rb Outdated Show resolved Hide resolved
##
# @param source the object to characterize
def initialize(source:)
self.source = source
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be @source = source?

##
# @return [void]
#
# @raise [RuntimeError] if visibility propogation fails
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the text here needs to change.

Hydra::Works::CharacterizationService.run(characterization_proxy, filepath)
Rails.logger.debug "Ran characterization on #{characterization_proxy.id} (#{characterization_proxy.mime_type})"
characterization_proxy.alpha_channels = channels(filepath) if source.image? && Hyrax.config.iiif_image_server?
persist
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the above comment, I wonder about the utility of moving the persistence code into this method.

Comment on lines +37 to +36
def characterization_proxy
raise "#{source.class.characterization_proxy} was not found for FileSet #{source.id}" unless source.characterization_proxy?
source.characterization_proxy
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment on this method as above.

end

def filepath
Hyrax::WorkingDirectory.find_or_retrieve(source.original_file.id, source.id.id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would you think about tossing a comment here calling out the double id call is intentional (and required)

Copy link
Member

@mjgiarlo mjgiarlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have some test-related comments and questions as well. Thanks for considering!

FileSet.new(id: file_set_id).tap do |fs|
allow(fs).to receive(:original_file).and_return(file)
allow(fs).to receive(:update_index)
context 'when the work is an ActiveFedora FileSet' do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a context we want to keep here? the job is not switching behavior anymore.

it "reindexes the collection" do
expect(collection).to receive(:update_index)
described_class.perform_now(file_set, file.id)
before do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test should be removed, since the characterizejob no longer handles this behavior. I believe the test should only ensure that the characterizer factory returns a thing that #characterize gets called on.

context 'when the characterization proxy content is absent' do
before { allow(file_set).to receive(:characterization_proxy?).and_return(false) }
it 'raises an error' do
expect { characterizer.characterize }.to raise_error(StandardError, /original_file was not found/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if this can/should change to RuntimeError

@@ -0,0 +1,32 @@
RSpec.describe Hyrax::ResourceCharacterizer do
subject(:characterizer) { described_class.new(source: resource) }
let(:resource) { FactoryBot.create(:file_set).valkyrie_resource }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is dead code since each context below has its own resource let block

before { allow(resource).to receive(:characterization_proxy?).and_return(false) }

it 'runs Hydra::Works::CharacterizationService and creates a CreateDerivativesJob' do
expect { characterizer.characterize }.to raise_error(StandardError, /original_file was not found/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RuntimeError?

@aaron-collier aaron-collier force-pushed the characterization_service branch from 0c68ba8 to df67e5e Compare January 24, 2020 20:22
@aaron-collier aaron-collier force-pushed the characterization_service branch from df67e5e to 2845fb5 Compare January 24, 2020 20:26
@aaron-collier aaron-collier force-pushed the characterization_service branch from 2845fb5 to 328c6a9 Compare January 24, 2020 20:48
subject(:characterizer) { described_class.new(source: resource) }

before do
allow(Hyrax::FileSet).to receive(:find).with(resource.id).and_return(resource)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can Hyrax::FileSet do this? is this coming from one of the includes??

allow(Hyrax::FileSet).to receive(:find).with(resource.id).and_return(resource)
allow(Hydra::Works::CharacterizationService).to receive(:run)
allow(Hyrax.persister).to receive(:save)
allow(CreateDerivativesJob).to receive(:perform_later)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this just to avoid running the job? if so, it shouldn't be needed since we use the test queue adapter by default in the test suite.

before do
allow(Hyrax::FileSet).to receive(:find).with(resource.id).and_return(resource)
allow(Hydra::Works::CharacterizationService).to receive(:run)
allow(Hyrax.persister).to receive(:save)
Copy link
Contributor

@no-reply no-reply Jan 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay to just actually call save on the persister?

if we want to avoid round tripping wings, we could do allow(Hyrax).to receive(:metadata_adapter).and_return(Valkyrie::MetadataAdapter.find(:test_adapter)) to use the test/memory adapter for the duration.

i've considered rigging this into some RSpec metadata, but it hasn't made it's way into the codebase yet.

it 'runs Hydra::Works::CharacterizationService and creates a CreateDerivativesJob' do
expect(Hydra::Works::CharacterizationService).to receive(:run).with(resource.original_file, anything)
expect(CreateDerivativesJob).to receive(:perform_later).with(resource, resource.original_file.id, anything)
expect(Hyrax.persister).to receive(:save).twice
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

twice with_args? or, again, consider just actually saving the objects to memory and expecting them to be saved.

#
# @raise [RuntimeError] if FileSet is missing the characterization_proxy
def characterize
Hydra::Works::CharacterizationService.run(characterization_proxy, filepath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about injecting this dependency?

##
# @param service [#run]
def characterize(service: Hydra::Works::CharacterizationService)
  service.run(characterization_proxy, filepath)
  ...
end

in this case, we could inject a spy service in the test suite and check its arguments, instead of hard-stubbing the class method.

# @return [void]
#
# @raise [RuntimeError] if FileSet is missing the characterization_proxy
def characterize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same method signature/dependency injection suggestion as https://github.com/samvera/hyrax/pull/4213/files#r370888617

Rails.logger.debug "Ran characterization on #{characterization_proxy.id} (#{characterization_proxy.mime_type})"
characterization_proxy.alpha_channels = channels(filepath) if source.image? && Hyrax.config.iiif_image_server?
Hyrax.persister.save(resource: characterization_proxy)
Hyrax.persister.save(resource: source)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this trigger a reindex of source in some way, or is it not needed?

@stale
Copy link

stale bot commented Feb 24, 2020

This issue has been automatically marked as stale because it has not had activity for 30 days. It will be closed if no further activity occurs within 14 days. Thank you for your contributions.

@stale stale bot added the stale label Feb 24, 2020
@stale stale bot closed this Mar 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow CharacterizeJob to receive either an AF object or a Valkyrie resource as the FileSet
6 participants