Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract characterization service to allow ActiveFedora and Valkyrie filesets #4213

Closed
wants to merge 13 commits into from
32 changes: 4 additions & 28 deletions app/jobs/characterize_job.rb
Original file line number Diff line number Diff line change
@@ -1,33 +1,9 @@
# Characterizes the file at 'filepath' if available, otherwise, pulls a copy from the repository
# and runs characterization on that file.
class CharacterizeJob < Hyrax::ApplicationJob
queue_as Hyrax.config.ingest_queue_name

# Characterizes the file at 'filepath' if available, otherwise, pulls a copy from the repository
# and runs characterization on that file.
# @param [FileSet] file_set
# @param [String] file_id identifier for a Hydra::PCDM::File
# @param [String, NilClass] filepath the cached file within the Hyrax.config.working_path
def perform(file_set, file_id, filepath = nil)
raise "#{file_set.class.characterization_proxy} was not found for FileSet #{file_set.id}" unless file_set.characterization_proxy?
filepath = Hyrax::WorkingDirectory.find_or_retrieve(file_id, file_set.id) unless filepath && File.exist?(filepath)
characterize(file_set, file_id, filepath)
CreateDerivativesJob.perform_later(file_set, file_id, filepath)
def perform(file_set)
Hyrax::Characterizer.for(source: file_set).characterize
end

private

def characterize(file_set, _file_id, filepath)
Hydra::Works::CharacterizationService.run(file_set.characterization_proxy, filepath)
Rails.logger.debug "Ran characterization on #{file_set.characterization_proxy.id} (#{file_set.characterization_proxy.mime_type})"
file_set.characterization_proxy.alpha_channels = channels(filepath) if file_set.image? && Hyrax.config.iiif_image_server?
file_set.characterization_proxy.save!
file_set.update_index
end

def channels(filepath)
ch = MiniMagick::Tool::Identify.new do |cmd|
cmd.format '%[channels]'
cmd << filepath
end
[ch]
end
end
2 changes: 2 additions & 0 deletions app/models/hyrax/file_set.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ module Hyrax
# @see https://wiki.duraspace.org/display/samvera/Hydra%3A%3AWorks+Shared+Modeling
class FileSet < Hyrax::Resource
include Hyrax::Schema(:core_metadata)
include Hyrax::FileSet::Characterization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both @aaron-collier and I believe this is the smallest possible set of includes to make sure resource-backed file sets respond to messages that are sent in the course of characterizing them.

Hyrax::FileSet::Characterization is needed to set up the characterization proxy, and Hydra::Works::MimeTypes is needed to ask a file set what sort of files it wraps (e.g., #image?), based on mime types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on including MimeTypes, the derivatives job also needs to know that for a resource-backed file set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if these are required FileSet interface, let's get them into the 'a Hyrax::FileSet' shared specs in lib/hyrax/specs/shared_specs/hydra_works.rb?

i'm tempted to look for a way around making this the interface, though. so far we've been mostly successful in avoiding putting business logic in the models. while there's definitely space for compromise on that, it seems worth at least poking at other possibilities. any ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: MimeTypes,

Hyrax::FileSet doesn't (yet) have a #mime_type property, so i think the include is injecting a bunch of methods that will currently raise errors?

Copy link
Contributor

@no-reply no-reply Jan 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah... noticing that ::FileSet also doesn't have a mime_type property and just delegates it to #characterization_proxy. it seems like we should follow suit and leave #mime_type at the file level?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering about this as a possible solution to the #mime_type issue #4215

include Hydra::Works::MimeTypes

attribute :file_ids, Valkyrie::Types::Array.of(Valkyrie::Types::ID) # id for FileMetadata resources
attribute :original_file_id, Valkyrie::Types::ID # id for FileMetadata resource
Expand Down
21 changes: 21 additions & 0 deletions app/services/hyrax/characterizer.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# frozen_string_literal: true

module Hyrax
no-reply marked this conversation as resolved.
Show resolved Hide resolved
##
# Determines which characterizer to run based on the file_set type
# allowing implementation of Valkyrie file_sets
class Characterizer
##
# @param source: the object to run a characterizer on
#
# @return [#characterize]
def self.for(source:)
case source
when Hyrax::FileSetBehavior # ActiveFedora
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this ever be the case? FileSetBehavior is a module not a class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the case when passing a FileSet and not a Hyrax::FileSet. We see the same style here: https://github.com/samvera/hyrax/blob/master/app/services/hyrax/visibility_propagator.rb#L12. We verified this when stepping through this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcoyne yeah. @aaron-collier and I verified this yesterday.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcoyne see:

2.5.5 :001 > file_set = FileSet.new
 => #<FileSet id: nil, head: [], tail: [], depositor: nil, title: [], date_uploaded: nil, date_modified: nil, alternative_title: [], label: nil, relative_path: nil, import_url: nil, resource_type: [], creator: [], contributor: [], description: [], abstract: [], keyword: [], license: [], rights_notes: [], rights_statement: [], access_right: [], publisher: [], date_created: [], subject: [], language: [], identifier: [], based_near: [], related_url: [], bibliographic_citation: [], source: [], access_control_id: nil, embargo_id: nil, lease_id: nil> 

2.5.5 :002 > Hyrax::FileSetBehavior === file_set
 => true 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems really unclear and easy to violate this assumption.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use ActiveFedora::Base instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm 👍 on using ActiveFedora::Base, with the idea that as long as we're in a position where we need to do type checks we should do the least strict type checks that will work.

in theory, folks can implement a valid AF File Set class that doesn't include Hyrax::FileSetBehavior, right?

FileSetCharacterizer.new(source: source)
when Hyrax::FileSet # Valkyrie
ResourceCharacterizer.new(source: source)
end
end
end
end
50 changes: 50 additions & 0 deletions app/services/hyrax/file_set_characterizer.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# frozen_string_literal: true

module Hyrax
##
# Characterizes an ActiveFedora based FileSet
class FileSetCharacterizer
##
# @!attribute [rw] source
# @return [#characterize]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit fuzzy on this. Is this saying that the thing in the source attr responds to #characterize. Does it? I would expect source to return an instance of an AF-based file set.

attr_accessor :source

##
# @param source the object to characterize
def initialize(source:)
@source = source
end

##
# @return [void]
#
# @raise [RuntimeError] if FileSet is missing the characterization_proxy
def characterize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same method signature/dependency injection suggestion as https://github.com/samvera/hyrax/pull/4213/files#r370888617

Hydra::Works::CharacterizationService.run(characterization_proxy, filepath)
Rails.logger.debug "Ran characterization on #{characterization_proxy.id} (#{characterization_proxy.mime_type})"
characterization_proxy.alpha_channels = channels(filepath) if source.image? && Hyrax.config.iiif_image_server?
characterization_proxy.save!
source.update_index
CreateDerivativesJob.perform_later(source, source.original_file.id, filepath)
end

private

def characterization_proxy
raise "#{source.class.characterization_proxy} was not found for FileSet #{source.id}" unless source.characterization_proxy?
source.characterization_proxy
end
Comment on lines +33 to +36
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is called five times, I think, which means a bunch of unnecessary checks to see if the proxy exists. Would you be open to inlining this back up in #characterize?


def filepath
Hyrax::WorkingDirectory.find_or_retrieve(source.original_file.id, source.id)
end

def channels(path)
ch = MiniMagick::Tool::Identify.new do |cmd|
cmd.format '%[channels]'
cmd << path
end
[ch]
end
end
end
52 changes: 52 additions & 0 deletions app/services/hyrax/resource_characterizer.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# frozen_string_literal: true

module Hyrax
##
# Characterizes a Valkyrie based FileSet
class ResourceCharacterizer
##
# @!attribute [rw] source
# @return [#characterize]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. I would expect source to return an instance of a Valkyrie-based file set.

attr_accessor :source

##
# @param source the object to characterize
def initialize(source:)
@source = source
end

##
# @return [void]
#
# @raise [RuntimeError] if FileSet is missing the characterization_proxy
def characterize
Hydra::Works::CharacterizationService.run(characterization_proxy, filepath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about injecting this dependency?

##
# @param service [#run]
def characterize(service: Hydra::Works::CharacterizationService)
  service.run(characterization_proxy, filepath)
  ...
end

in this case, we could inject a spy service in the test suite and check its arguments, instead of hard-stubbing the class method.

Rails.logger.debug "Ran characterization on #{characterization_proxy.id} (#{characterization_proxy.mime_type})"
characterization_proxy.alpha_channels = channels(filepath) if source.image? && Hyrax.config.iiif_image_server?
Hyrax.persister.save(resource: characterization_proxy)
Hyrax.persister.save(resource: source)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this trigger a reindex of source in some way, or is it not needed?

CreateDerivativesJob.perform_later(source, source.original_file.id, filepath)
end

private

def characterization_proxy
raise "#{source.class.characterization_proxy} was not found for FileSet #{source.id}" unless source.characterization_proxy?
source.characterization_proxy
end
Comment on lines +33 to +36
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment on this method as above.


def filepath
# The current version of Valkyrie id returns a Valkyrie::ID and requires a .id to actually retrieve the id.
# This should be updated to source.id after a Valkyrie update
Hyrax::WorkingDirectory.find_or_retrieve(source.original_file.id, source.id.id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would you think about tossing a comment here calling out the double id call is intentional (and required)

end

def channels(path)
ch = MiniMagick::Tool::Identify.new do |cmd|
cmd.format '%[channels]'
cmd << path
end
[ch]
end
end
end
38 changes: 9 additions & 29 deletions spec/jobs/characterize_job_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
allow(fs).to receive(:update_index)
end
end
# let(:io) { JobIoWrapper.new(file_set_id: file_set.id, user: create(:user), path: filename) }
let(:file) do
Hydra::PCDM::File.new.tap do |f|
f.content = 'foo'
Expand All @@ -17,36 +16,17 @@
end
end

before do
allow(FileSet).to receive(:find).with(file_set_id).and_return(file_set)
allow(Hydra::Works::CharacterizationService).to receive(:run).with(file, filename)
allow(CreateDerivativesJob).to receive(:perform_later).with(file_set, file.id, filename)
end

context 'with valid filepath param' do
let(:filename) { File.join(fixture_path, 'world.png') }
context "when the file set's work is in a collection" do
let(:work) { build(:generic_work) }
let(:collection) { build(:collection_lw) }

it 'skips Hyrax::WorkingDirectory' do
expect(Hyrax::WorkingDirectory).not_to receive(:find_or_retrieve)
expect(Hydra::Works::CharacterizationService).to receive(:run).with(file, filename)
described_class.perform_now(file_set, file.id, filename)
before do
allow(file_set).to receive(:parent).and_return(work)
allow(work).to receive(:in_collections).and_return([collection])
end
end

context 'when the characterization proxy content is present' do
it 'runs Hydra::Works::CharacterizationService and creates a CreateDerivativesJob' do
expect(Hydra::Works::CharacterizationService).to receive(:run).with(file, filename)
expect(file).to receive(:save!)
expect(file_set).to receive(:update_index)
expect(CreateDerivativesJob).to receive(:perform_later).with(file_set, file.id, filename)
described_class.perform_now(file_set, file.id)
end
end

context 'when the characterization proxy content is absent' do
before { allow(file_set).to receive(:characterization_proxy?).and_return(false) }
it 'raises an error' do
expect { described_class.perform_now(file_set, file.id) }.to raise_error(StandardError, /original_file was not found/)
it "reindexes the collection" do
expect(collection).to receive(:update_index)
described_class.perform_now(file_set)
end
end
end
43 changes: 43 additions & 0 deletions spec/services/hyrax/file_set_characterizer_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
RSpec.describe Hyrax::FileSetCharacterizer do
subject(:characterizer) { described_class.new(source: file_set) }
no-reply marked this conversation as resolved.
Show resolved Hide resolved

let(:file_set_id) { 'abc12345' }
let(:filename) { Rails.root.join('tmp', 'uploads', 'ab', 'c1', '23', '45', 'abc12345', 'picture.png').to_s }
let(:file_set) do
FileSet.new(id: file_set_id).tap do |fs|
allow(fs).to receive(:original_file).and_return(file)
allow(fs).to receive(:update_index)
end
end
let(:file) do
Hydra::PCDM::File.new.tap do |f|
f.content = 'foo'
f.original_name = 'picture.png'
f.save!
allow(f).to receive(:save!)
end
end

before do
allow(FileSet).to receive(:find).with(file_set_id).and_return(file_set)
allow(Hydra::Works::CharacterizationService).to receive(:run).with(file, filename)
allow(CreateDerivativesJob).to receive(:perform_later).with(file_set, file.id, filename)
end

context 'when the characterization proxy content is present' do
it 'runs Hydra::Works::CharacterizationService and creates a CreateDerivativesJob' do
expect(Hydra::Works::CharacterizationService).to receive(:run).with(file, filename)
expect(file).to receive(:save!)
expect(file_set).to receive(:update_index)
expect(CreateDerivativesJob).to receive(:perform_later).with(file_set, file.id, filename)
characterizer.characterize
end
end

context 'when the characterization proxy content is absent' do
before { allow(file_set).to receive(:characterization_proxy?).and_return(false) }
it 'raises an error' do
expect { characterizer.characterize }.to raise_error(StandardError, /original_file was not found/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if this can/should change to RuntimeError

end
end
end
31 changes: 31 additions & 0 deletions spec/services/hyrax/resource_characterizer_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
RSpec.describe Hyrax::ResourceCharacterizer do
subject(:characterizer) { described_class.new(source: resource) }

before do
allow(Hyrax::FileSet).to receive(:find).with(resource.id).and_return(resource)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can Hyrax::FileSet do this? is this coming from one of the includes??

allow(Hydra::Works::CharacterizationService).to receive(:run)
allow(Hyrax.persister).to receive(:save)
Copy link
Contributor

@no-reply no-reply Jan 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay to just actually call save on the persister?

if we want to avoid round tripping wings, we could do allow(Hyrax).to receive(:metadata_adapter).and_return(Valkyrie::MetadataAdapter.find(:test_adapter)) to use the test/memory adapter for the duration.

i've considered rigging this into some RSpec metadata, but it hasn't made it's way into the codebase yet.

allow(CreateDerivativesJob).to receive(:perform_later)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this just to avoid running the job? if so, it shouldn't be needed since we use the test queue adapter by default in the test suite.

end

context 'with a complete Valkyrie FileSet' do
let(:resource) { FactoryBot.create(:file_set, content: File.open(fixture_path + '/world.png')).valkyrie_resource }

it 'runs Hydra::Works::CharacterizationService and creates a CreateDerivativesJob' do
expect(Hydra::Works::CharacterizationService).to receive(:run).with(resource.original_file, anything)
expect(CreateDerivativesJob).to receive(:perform_later).with(resource, resource.original_file.id, anything)
expect(Hyrax.persister).to receive(:save).twice
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

twice with_args? or, again, consider just actually saving the objects to memory and expecting them to be saved.

characterizer.characterize
end
end

context 'without a complete Valkyrie FileSet' do
let(:resource) { FactoryBot.create(:file_set).valkyrie_resource }

before { allow(resource).to receive(:characterization_proxy?).and_return(false) }

it 'runs Hydra::Works::CharacterizationService and creates a CreateDerivativesJob' do
expect { characterizer.characterize }.to raise_error(StandardError, /original_file was not found/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RuntimeError?

end
end
end