-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract characterization service to allow ActiveFedora and Valkyrie filesets #4213
Changes from all commits
e80bfa0
dbbd474
10076f8
a5c7b0e
3282092
5ed9a6f
0dcf42b
d4ca105
0eef6ec
2d0d3f5
872ba30
88aab47
328c6a9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,33 +1,9 @@ | ||
# Characterizes the file at 'filepath' if available, otherwise, pulls a copy from the repository | ||
# and runs characterization on that file. | ||
class CharacterizeJob < Hyrax::ApplicationJob | ||
queue_as Hyrax.config.ingest_queue_name | ||
|
||
# Characterizes the file at 'filepath' if available, otherwise, pulls a copy from the repository | ||
# and runs characterization on that file. | ||
# @param [FileSet] file_set | ||
# @param [String] file_id identifier for a Hydra::PCDM::File | ||
# @param [String, NilClass] filepath the cached file within the Hyrax.config.working_path | ||
def perform(file_set, file_id, filepath = nil) | ||
raise "#{file_set.class.characterization_proxy} was not found for FileSet #{file_set.id}" unless file_set.characterization_proxy? | ||
filepath = Hyrax::WorkingDirectory.find_or_retrieve(file_id, file_set.id) unless filepath && File.exist?(filepath) | ||
characterize(file_set, file_id, filepath) | ||
CreateDerivativesJob.perform_later(file_set, file_id, filepath) | ||
def perform(file_set) | ||
Hyrax::Characterizer.for(source: file_set).characterize | ||
end | ||
|
||
private | ||
|
||
def characterize(file_set, _file_id, filepath) | ||
Hydra::Works::CharacterizationService.run(file_set.characterization_proxy, filepath) | ||
Rails.logger.debug "Ran characterization on #{file_set.characterization_proxy.id} (#{file_set.characterization_proxy.mime_type})" | ||
file_set.characterization_proxy.alpha_channels = channels(filepath) if file_set.image? && Hyrax.config.iiif_image_server? | ||
file_set.characterization_proxy.save! | ||
file_set.update_index | ||
end | ||
|
||
def channels(filepath) | ||
ch = MiniMagick::Tool::Identify.new do |cmd| | ||
cmd.format '%[channels]' | ||
cmd << filepath | ||
end | ||
[ch] | ||
end | ||
end |
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
@@ -0,0 +1,21 @@ | ||||
# frozen_string_literal: true | ||||
|
||||
module Hyrax | ||||
no-reply marked this conversation as resolved.
Show resolved
Hide resolved
|
||||
## | ||||
# Determines which characterizer to run based on the file_set type | ||||
# allowing implementation of Valkyrie file_sets | ||||
class Characterizer | ||||
## | ||||
# @param source: the object to run a characterizer on | ||||
# | ||||
# @return [#characterize] | ||||
def self.for(source:) | ||||
case source | ||||
when Hyrax::FileSetBehavior # ActiveFedora | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would this ever be the case? FileSetBehavior is a module not a class.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is the case when passing a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jcoyne yeah. @aaron-collier and I verified this yesterday. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jcoyne see: 2.5.5 :001 > file_set = FileSet.new
=> #<FileSet id: nil, head: [], tail: [], depositor: nil, title: [], date_uploaded: nil, date_modified: nil, alternative_title: [], label: nil, relative_path: nil, import_url: nil, resource_type: [], creator: [], contributor: [], description: [], abstract: [], keyword: [], license: [], rights_notes: [], rights_statement: [], access_right: [], publisher: [], date_created: [], subject: [], language: [], identifier: [], based_near: [], related_url: [], bibliographic_citation: [], source: [], access_control_id: nil, embargo_id: nil, lease_id: nil>
2.5.5 :002 > Hyrax::FileSetBehavior === file_set
=> true There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems really unclear and easy to violate this assumption. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i'm 👍 on using in theory, folks can implement a valid AF File Set class that doesn't include |
||||
FileSetCharacterizer.new(source: source) | ||||
when Hyrax::FileSet # Valkyrie | ||||
ResourceCharacterizer.new(source: source) | ||||
end | ||||
end | ||||
end | ||||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
# frozen_string_literal: true | ||
|
||
module Hyrax | ||
## | ||
# Characterizes an ActiveFedora based FileSet | ||
class FileSetCharacterizer | ||
## | ||
# @!attribute [rw] source | ||
# @return [#characterize] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a bit fuzzy on this. Is this saying that the thing in the |
||
attr_accessor :source | ||
|
||
## | ||
# @param source the object to characterize | ||
def initialize(source:) | ||
@source = source | ||
end | ||
|
||
## | ||
# @return [void] | ||
# | ||
# @raise [RuntimeError] if FileSet is missing the characterization_proxy | ||
def characterize | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same method signature/dependency injection suggestion as https://github.com/samvera/hyrax/pull/4213/files#r370888617 |
||
Hydra::Works::CharacterizationService.run(characterization_proxy, filepath) | ||
Rails.logger.debug "Ran characterization on #{characterization_proxy.id} (#{characterization_proxy.mime_type})" | ||
characterization_proxy.alpha_channels = channels(filepath) if source.image? && Hyrax.config.iiif_image_server? | ||
characterization_proxy.save! | ||
source.update_index | ||
CreateDerivativesJob.perform_later(source, source.original_file.id, filepath) | ||
end | ||
|
||
private | ||
|
||
def characterization_proxy | ||
raise "#{source.class.characterization_proxy} was not found for FileSet #{source.id}" unless source.characterization_proxy? | ||
source.characterization_proxy | ||
end | ||
Comment on lines
+33
to
+36
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This method is called five times, I think, which means a bunch of unnecessary checks to see if the proxy exists. Would you be open to inlining this back up in |
||
|
||
def filepath | ||
Hyrax::WorkingDirectory.find_or_retrieve(source.original_file.id, source.id) | ||
end | ||
|
||
def channels(path) | ||
ch = MiniMagick::Tool::Identify.new do |cmd| | ||
cmd.format '%[channels]' | ||
cmd << path | ||
end | ||
[ch] | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# frozen_string_literal: true | ||
|
||
module Hyrax | ||
## | ||
# Characterizes a Valkyrie based FileSet | ||
class ResourceCharacterizer | ||
## | ||
# @!attribute [rw] source | ||
# @return [#characterize] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above. I would expect |
||
attr_accessor :source | ||
|
||
## | ||
# @param source the object to characterize | ||
def initialize(source:) | ||
@source = source | ||
end | ||
|
||
## | ||
# @return [void] | ||
# | ||
# @raise [RuntimeError] if FileSet is missing the characterization_proxy | ||
def characterize | ||
Hydra::Works::CharacterizationService.run(characterization_proxy, filepath) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what do you think about injecting this dependency? ##
# @param service [#run]
def characterize(service: Hydra::Works::CharacterizationService)
service.run(characterization_proxy, filepath)
...
end in this case, we could inject a spy service in the test suite and check its arguments, instead of hard-stubbing the class method. |
||
Rails.logger.debug "Ran characterization on #{characterization_proxy.id} (#{characterization_proxy.mime_type})" | ||
characterization_proxy.alpha_channels = channels(filepath) if source.image? && Hyrax.config.iiif_image_server? | ||
Hyrax.persister.save(resource: characterization_proxy) | ||
Hyrax.persister.save(resource: source) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should this trigger a reindex of |
||
CreateDerivativesJob.perform_later(source, source.original_file.id, filepath) | ||
end | ||
|
||
private | ||
|
||
def characterization_proxy | ||
raise "#{source.class.characterization_proxy} was not found for FileSet #{source.id}" unless source.characterization_proxy? | ||
source.characterization_proxy | ||
end | ||
Comment on lines
+33
to
+36
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment on this method as above. |
||
|
||
def filepath | ||
# The current version of Valkyrie id returns a Valkyrie::ID and requires a .id to actually retrieve the id. | ||
# This should be updated to source.id after a Valkyrie update | ||
Hyrax::WorkingDirectory.find_or_retrieve(source.original_file.id, source.id.id) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what would you think about tossing a comment here calling out the double id call is intentional (and required) |
||
end | ||
|
||
def channels(path) | ||
ch = MiniMagick::Tool::Identify.new do |cmd| | ||
cmd.format '%[channels]' | ||
cmd << path | ||
end | ||
[ch] | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
RSpec.describe Hyrax::FileSetCharacterizer do | ||
subject(:characterizer) { described_class.new(source: file_set) } | ||
no-reply marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
let(:file_set_id) { 'abc12345' } | ||
let(:filename) { Rails.root.join('tmp', 'uploads', 'ab', 'c1', '23', '45', 'abc12345', 'picture.png').to_s } | ||
let(:file_set) do | ||
FileSet.new(id: file_set_id).tap do |fs| | ||
allow(fs).to receive(:original_file).and_return(file) | ||
allow(fs).to receive(:update_index) | ||
end | ||
end | ||
let(:file) do | ||
Hydra::PCDM::File.new.tap do |f| | ||
f.content = 'foo' | ||
f.original_name = 'picture.png' | ||
f.save! | ||
allow(f).to receive(:save!) | ||
end | ||
end | ||
|
||
before do | ||
allow(FileSet).to receive(:find).with(file_set_id).and_return(file_set) | ||
allow(Hydra::Works::CharacterizationService).to receive(:run).with(file, filename) | ||
allow(CreateDerivativesJob).to receive(:perform_later).with(file_set, file.id, filename) | ||
end | ||
|
||
context 'when the characterization proxy content is present' do | ||
it 'runs Hydra::Works::CharacterizationService and creates a CreateDerivativesJob' do | ||
expect(Hydra::Works::CharacterizationService).to receive(:run).with(file, filename) | ||
expect(file).to receive(:save!) | ||
expect(file_set).to receive(:update_index) | ||
expect(CreateDerivativesJob).to receive(:perform_later).with(file_set, file.id, filename) | ||
characterizer.characterize | ||
end | ||
end | ||
|
||
context 'when the characterization proxy content is absent' do | ||
before { allow(file_set).to receive(:characterization_proxy?).and_return(false) } | ||
it 'raises an error' do | ||
expect { characterizer.characterize }.to raise_error(StandardError, /original_file was not found/) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am wondering if this can/should change to |
||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
RSpec.describe Hyrax::ResourceCharacterizer do | ||
subject(:characterizer) { described_class.new(source: resource) } | ||
|
||
before do | ||
allow(Hyrax::FileSet).to receive(:find).with(resource.id).and_return(resource) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can |
||
allow(Hydra::Works::CharacterizationService).to receive(:run) | ||
allow(Hyrax.persister).to receive(:save) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. okay to just actually call save on the persister? if we want to avoid round tripping wings, we could do i've considered rigging this into some RSpec metadata, but it hasn't made it's way into the codebase yet. |
||
allow(CreateDerivativesJob).to receive(:perform_later) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this just to avoid running the job? if so, it shouldn't be needed since we use the test queue adapter by default in the test suite. |
||
end | ||
|
||
context 'with a complete Valkyrie FileSet' do | ||
let(:resource) { FactoryBot.create(:file_set, content: File.open(fixture_path + '/world.png')).valkyrie_resource } | ||
|
||
it 'runs Hydra::Works::CharacterizationService and creates a CreateDerivativesJob' do | ||
expect(Hydra::Works::CharacterizationService).to receive(:run).with(resource.original_file, anything) | ||
expect(CreateDerivativesJob).to receive(:perform_later).with(resource, resource.original_file.id, anything) | ||
expect(Hyrax.persister).to receive(:save).twice | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. twice |
||
characterizer.characterize | ||
end | ||
end | ||
|
||
context 'without a complete Valkyrie FileSet' do | ||
let(:resource) { FactoryBot.create(:file_set).valkyrie_resource } | ||
|
||
before { allow(resource).to receive(:characterization_proxy?).and_return(false) } | ||
|
||
it 'runs Hydra::Works::CharacterizationService and creates a CreateDerivativesJob' do | ||
expect { characterizer.characterize }.to raise_error(StandardError, /original_file was not found/) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. RuntimeError? |
||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both @aaron-collier and I believe this is the smallest possible set of includes to make sure resource-backed file sets respond to messages that are sent in the course of characterizing them.
Hyrax::FileSet::Characterization
is needed to set up the characterization proxy, andHydra::Works::MimeTypes
is needed to ask a file set what sort of files it wraps (e.g.,#image?
), based on mime types.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on including
MimeTypes
, the derivatives job also needs to know that for a resource-backed file set.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if these are required
FileSet
interface, let's get them into the'a Hyrax::FileSet'
shared specs inlib/hyrax/specs/shared_specs/hydra_works.rb
?i'm tempted to look for a way around making this the interface, though. so far we've been mostly successful in avoiding putting business logic in the models. while there's definitely space for compromise on that, it seems worth at least poking at other possibilities. any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re:
MimeTypes
,Hyrax::FileSet
doesn't (yet) have a#mime_type
property, so i think the include is injecting a bunch of methods that will currently raise errors?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah... noticing that
::FileSet
also doesn't have amime_type
property and just delegates it to#characterization_proxy
. it seems like we should follow suit and leave#mime_type
at the file level?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering about this as a possible solution to the
#mime_type
issue #4215